IndexError: argmax(): Expected reduction dim 1 to have non-zero size. [BUG]

GasolSun36 commented 1 year ago

Describe the bug I trained two LLAMA-2-7B-HF as the actor and critic model in first two steps with DeepSpeed-Chat without any problems. When I was doing deepspeed-chat for step3 RLHF training, the error was reported as this: IndexError: argmax(): Expected reduction dim 1 to have non-zero size.

I've looked deeper and it looks like it's reporting an error inside the execution of this generate function (in DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py) and that outputs['logits'] is null (in transformers/generation/utils.py).

with torch.no_grad(): seq = self.actor_model.module.generate( prompts, attention_mask=mask, max_length=max_min_length, pad_token_id=self.tokenizer.pad_token_id, synced_gpus=self.z3_enabled, **kwargs)

outputs = self( **model_inputs, return_dict=True, output_attentions=output_attentions, output_hidden_states=output_hidden_states, ) print(outputs['logits']) print(outputs['logits'].shape)

output of print:

tensor([], device='cuda:0', size=(1, 256, 0)) torch.Size([1, 256, 0])

This means that my actor model doesn't actually generate any tokens at all. I see the prompts in generate function parameters and this is fine:

tensor([[32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 1, 29871, 13, 13, 29950, 7889, 29901, 817, 10529, 373, 1209, 573, 17869, 13, 13, 7900, 22137, 29901, 306, 508, 29915, 29873, 2289, 3867, 1906, 1492, 1286, 29889, 1724, 306, 508, 437, 338, 1303, 8277, 322, 19138, 675, 963, 363, 366, 29892, 470, 4511, 366, 411, 385, 4148, 366, 1795, 1284, 8444, 29889, 13, 13, 29950, 7889, 29901, 3431, 825, 8277, 437, 366, 6907, 13, 13, 7900, 22137, 29901, 306, 508, 2367, 366, 263, 1051, 310, 278, 2246, 8277, 297, 1784, 1422, 13997, 29889, 13, 13, 29950, 7889, 29901, 3431, 2649, 592, 901, 13, 13, 7900, 22137, 29901]], device='cuda:0')

using LLama-2 tokenizer, we can convert to sentences:

Human: need suggestions on passive income Assistant: I can't really provide those right now. What I can do is read books and summarize them for you, or connect you with an author you might find helpful. Human: ok what books do you recommend Assistant: I can give you a list of the top books in many different categories. Human: ok tell me more Assistant:

Log output

The initialization of the four models is normal, and I'll show it starting with the step of training. *****[end] Initialized Reward Model [end] (duration: 11.71s)** * Running training ***** Beginning of Epoch 1/1, Total Generation Batches 3813 Free memory : 19.750183 (GigaBytes)
Total memory: 39.586121 (GigaBytes)
Requested memory: 0.304688 (GigaBytes) Setting maximum total tokens (input + output) to 512 WorkSpace: 0x7fc3d2000000 Traceback (most recent call last): File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in Traceback (most recent call last): File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in Traceback (most recent call last): Traceback (most recent call last): File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in Traceback (most recent call last): File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in main() File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main main()main()

File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main main() File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main main()out = trainer.generate_experience(batch_prompt['prompt'],

File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience out = trainer.generate_experience(batch_prompt['prompt'],out = trainer.generate_experience(batch_prompt['prompt'],

out = trainer.generate_experience(batch_prompt['prompt'],  File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience

File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience

File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience seq = self._generate_sequence(prompts, mask, step) File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence out = trainer.generate_experience(batch_prompt['prompt'],seq = self._generate_sequence(prompts, mask, step)

File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence seq = self._generate_sequence(prompts, mask, step) File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence seq = self._generate_sequence(prompts, mask, step)seq = self.actor_model.module.generate(

File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate seq = self._generate_sequence(prompts, mask, step)seq = self.actor_model.module.generate(

File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate Traceback (most recent call last): File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in seq = self.actor_model.module.generate( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate seq = self.actor_model.module.generate( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate seq = self.actor_model.module.generate( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate generate_ret_vals = self._generate(*inputs, *kwargs)generate_ret_vals = self._generate(inputs, **kwargs)

File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context main() File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main generate_ret_vals = self._generate(*inputs, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context generate_ret_vals = self._generate(*inputs, *kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context generate_ret_vals = self._generate(inputs, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context out = trainer.generate_experience(batch_prompt['prompt'], File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience seq = self._generate_sequence(prompts, mask, step) File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence seq = self.actor_model.module.generate( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate Traceback (most recent call last): File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in generate_ret_vals = self._generate(*inputs, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return func(args, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return func(*args, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return func(*args, *kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return func(args, kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return func(*args, **kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate main() File "DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main out = trainer.generate_experience(batch_prompt['prompt'], File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience seq = self._generate_sequence(prompts, mask, step) File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence return self.greedy_search(seq = self.actor_model.module.generate(

File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate return self.greedy_search( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search return self.greedy_search(return self.greedy_search(

File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search generate_ret_vals = self._generate(*inputs, *kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return self.greedy_search( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search return func(args, **kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return self.greedy_search( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search next_tokens = torch.argmax(next_tokens_scores, dim=-1)return self.greedy_search( Traceback (most recent call last):

IndexError  File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 635, in <module>

File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search next_tokens = torch.argmax(next_tokens_scores, dim=-1): argmax(): Expected reduction dim 1 to have non-zero size. next_tokens = torch.argmax(next_tokens_scores, dim=-1)next_tokens = torch.argmax(next_tokens_scores, dim=-1) IndexError : argmax(): Expected reduction dim 1 to have non-zero size. IndexError: IndexErrorargmax(): Expected reduction dim 1 to have non-zero size.: argmax(): Expected reduction dim 1 to have non-zero size.
next_tokens = torch.argmax(next_tokens_scores, dim=-1) IndexError: argmax(): Expected reduction dim 1 to have non-zero size. next_tokens = torch.argmax(next_tokens_scores, dim=-1) IndexError: argmax(): Expected reduction dim 1 to have non-zero size. main() File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 493, in main next_tokens = torch.argmax(next_tokens_scores, dim=-1) IndexError: argmax(): Expected reduction dim 1 to have non-zero size. out = trainer.generate_experience(batch_prompt['prompt'], File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 124, in generate_experience seq = self._generate_sequence(prompts, mask, step) File "/data1/sjs/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 84, in _generate_sequence seq = self.actor_model.module.generate( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate generate_ret_vals = self._generate(*inputs, *kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate return self.greedy_search( File "/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2477, in greedy_search next_tokens = torch.argmax(next_tokens_scores, dim=-1) IndexError: argmax(): Expected reduction dim 1 to have non-zero size.

To Reproduce

step1 run.sh:

deepspeed main.py \ --data_path Dahoas/rm-static \ --data_split 2,4,4 \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --max_seq_len 512 \ --learning_rate 9.65e-6 \ --weight_decay 0. \ --num_train_epochs 1 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage 2 \ --deepspeed \ --output_dir llama2

step2 run.sh:

deepspeed main.py \ --data_path Dahoas/rm-static \ --data_split 2,4,4 \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 8 \ --max_seq_len 512 \ --learning_rate 9.65e-6 \ --weight_decay 0.1 \ --num_padding_at_beginning 0 \ --num_train_epochs 1 \ --gradient_accumulation_steps 1 \ --lr_scheduler_type cosine \ --num_warmup_steps 0 \ --seed 1234 \ --gradient_checkpointing \ --zero_stage 3 \ --deepspeed \ --offload \ --output_dir llama2

step3 run.sh:

Num_Padding_at_Beginning=1 # this is model related Actor_Lr=9.65e-6 Critic_Lr=5e-6 deepspeed --master_port 12346 main.py \ --data_path Dahoas/rm-static \ --data_split 2,4,4 \ --actor_model_name_or_path DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/llama2 \ --critic_model_name_or_path DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/llama2 \ --num_padding_at_beginning 1 \ --per_device_generation_batch_size 1 \ --per_device_training_batch_size 1 \ --generation_batches 1 \ --ppo_epochs 1 \ --max_answer_seq_len 256 \ --max_prompt_seq_len 256 \ --actor_learning_rate ${Actor_Lr} \ --critic_learning_rate ${Critic_Lr} \ --actor_weight_decay 0.1 \ --critic_weight_decay 0.1 \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --gradient_accumulation_steps 1 \ --num_warmup_steps 100 \ --deepspeed \ --seed 1234 \ --actor_zero_stage 3 \ --critic_zero_stage 3 \ --offload \ --enable_hybrid_engine \ --inference_tp_size 1 \ --tp_gather_partition_size 1 \ --offload_reference_model \ --output_dir training_log_output \

Expected behavior Above.

ds_report output

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/torch'] torch version .................... 1.13.1+cu116 deepspeed install path ........... ['/home/xuchengjin/anaconda3/envs/llm/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.10.1, unknown, unknown torch cuda version ............... 11.6 torch hip version ................ None nvcc version ..................... 11.7 deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6 shared memory (/dev/shm) size .... 1.95 TB

Screenshots above.

System info (please complete the following information):

OS: Ubuntu 20.04
GPU count and types: one machine with x8 A100-40Gs each
Python version: 3.10
transformers: 4.32.0

Docker context No docker used.

Additional context No.

xyxxxxx commented 1 year ago

I encountered the same issue. Also ds-chat step3 training for Llama-2-7b-hf, when I enabled --print_answers, I found that the answers were empty strings:

--- prompt --> step=2, rank=2, ['\n\nHuman: Is it hard to become an air traffic controller?\n\nAssistant:']
--- prompt --> step=2, rank=1, ["\n\nHuman: I'd like to give a toast at my Christmas dinner party.\n\nAssistant:"]
--- prompt --> step=2, rank=0, ['\n\nHuman: How do I get a plumbers license.\n\nAssistant:']
--- ans    --> step=2, rank=2, ['']
--- ans    --> step=2, rank=1, ['']
--- ans    --> step=2, rank=0, ['']

And when I printed generated sequence:

# ppo_trainer.py

with torch.no_grad():
    seq = self.actor_model.module.generate(
        prompts,
        attention_mask=mask,
        max_length=max_min_length,
        pad_token_id=self.tokenizer.pad_token_id,
        synced_gpus=self.z3_enabled,
        **kwargs)

print(seq)

it's like:

[tensor([[32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000,     1, 29871,    13,    13, 29950,  7889, 29901,  6324,
         29892,   306, 29915, 29881,   763,   304,  1369,  3704,  6483, 17905,
         29872,   289,  1975,   297,   590,  2814,   664,   449, 29889,  1815,
           366,  8453,   920,   304,  2189,   445, 15058, 29973,    13,    13,
          7900, 22137, 29901, 18585, 29991,   739, 30010, 29879,  2289,  4780,
         29889,  2266, 30010, 29879,   263,  9004,   362, 29901,    13,    13,
          6730, 29892,  2317,  7812,   411,   596,  6900, 23468,  2920, 12435,
         29892,   322, 26681, 29889, 29871,    13,    13,  9190, 29892,   289,
           355,   596, 17905,   267, 29892, 24421,   596,  6567,   373,   596,
           266,  1141, 29879,   470,   373,   596,   298,  4512, 29889,    13,
            13, 10454, 29892,  3965,   596,   540,  1379,   964,   278, 11904,
           322,  1369,   304,  7812,   264,   596, 21152,  2745,   366,   508,
         29915, 29873,   748,   738, 26645, 29889, 29871,    13,    13, 12881,
           635, 29892,   289,   355,  1250,  1623,   964,   278,  6483, 17905,
         29872,   289,   355,  2602, 29889,    13,    13,  7058, 29915, 29879,
           599,   727,   338,   304,   372, 29991, 29871,  2803,   592,  1073,
           565,   366,   505,   738,  5155, 29889,    13,    13, 29950,  7889,
         29901, 20419,   306,   505,   777,  5155,  1244, 29889,   887,  2649,
           592,   304,   289,   355,   590, 17905,   267, 29892,   322,   769,
          7812,   264,   590, 21152, 29889,  1724,  5304,  1546,  1438, 24147,
         29892,   920,  1568,   626,   306,   289,  2548,   590, 17905,   267,
         29973,    13,    13,  7900, 22137, 29901,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0]], device='cuda:0')]

key env info:

torch: 2.0.0
deepspeed: 0.10.2
transformers: 4.31.0

xyxxxxx commented 1 year ago

After that, I discovered issue huggingface/transformers#25790, and I attempted to modify the tokenizer's config:

# utils/utils.py

if "llama" in model_name_or_path:
    from transformers.models.llama import LlamaTokenizer
    tokenizer = LlamaTokenizer.from_pretrained(
        model_name_or_path, fast_tokenizer=fast_tokenizer)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.unk_token
        # tokenizer.add_special_tokens({'pad_token': '[PAD]'})
        tokenizer.padding_side = 'left'

the generated sequence is like:

tensor([[    0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     1, 29871,
            13,    13, 29950,  7889, 29901,   817, 10529,   373,  1209,   573,
         17869,    13,    13,  7900, 22137, 29901,   306,   508, 29915, 29873,
          2289,  3867,  1906,  1492,  1286, 29889,  1724,   306,   508,   437,
           338,  1303,  8277,   322, 19138,   675,   963,   363,   366, 29892,
           470,  4511,   366,   411,   385,  4148,   366,  1795,  1284,  8444,
         29889,    13,    13, 29950,  7889, 29901,  3431,   825,  8277,   437,
           366,  6907,    13,    13,  7900, 22137, 29901,   306,   508,  2367,
           366,   263,  1051,   310,   278,  2246,  8277,   297,  1784,  1422,
         13997, 29889,    13,    13, 29950,  7889, 29901,  3431,  2649,   592,
           901,    13,    13,  7900, 22137, 29901,   306,   508,  2649,   366,
          1048,   278,  2246,  8277,   297,   278,  1494, 13997, 29901,    13,
            13, 29899,   259,   383,  2463,    13, 29899,   259, 10050, 29899,
         29888,  2463,    13, 29899,   259, 15197,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 15202,    13, 29899,   259,  9327,
            13, 29899,   259, 27099,    13, 29899,   259,  5298,    13, 29899,
           259, 16407, 29891,    13, 29899,   259, 20986, 29915, 29879,  8277,
            13, 29899,   259, 17278, 12733,    13, 29899,   259,  3201,   955,
            13, 29899,   259,  8133,  6390, 29879,    13, 29899,   259,  3929,
         27184,    13, 29899,   259, 22890,   708,    13, 29899,   259,  6033,
           749,    13, 29899,   259,   498, 29878,  5495,    13, 29899,   259,
         10443, 16157,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782, 29899,   326, 16123,   882,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782, 29899,  8477,    13, 29899,   259, 21782, 29899,  8477,
            13, 29899,   259, 21782, 29899,  8477,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782, 29899,  8477,    13, 29899,   259, 21782, 29899,  8477,
            13, 29899,   259, 21782, 29899,  8477,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782, 29899,  8477,    13, 29899,   259, 21782, 29899,  8477,
            13, 29899,   259, 21782, 29899,  8477,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782, 29899,  8477,    13, 29899,   259, 21782, 29899,  8477,
            13, 29899,   259, 21782, 29899,  8477,    13, 29899,   259, 21782,
         29899,  8477,    13, 29899,   259, 21782, 29899,  8477,    13, 29899,
           259, 21782]], device='cuda:0')

tensor([[    0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     1, 29871,    13,    13, 29950,
          7889, 29901,   306,   471, 13858, 14171,   263, 25008, 29889,  5806,
           306,   505,   263,   289,   496, 12779, 29915, 29879,  7426,  7743,
         29892,   825,  9793,   322,  7794,   575,   292,   526,  5181, 29892,
           304,  4953,   263,  1985, 25008, 29973,    13,    13,  7900, 22137,
         29901,  1670,   526,  1784,  4072,   310,  4307, 29891,   414, 29889,
          1763,   664,   408,   263,   970,   822,  1581, 29892,   470,   297,
           278,  4038,   310, 22161,  4307, 29892,   366,   674, 12234,   817,
           304,   748,   304,  4307,  3762, 29892,   988,   366,   674,   505,
           304,  2125,  4413,   322,  1209,   429,  2232, 29889,  1205,   565,
           366,   526,  8852,   297,  5874,   470, 17266,   403,   664, 29892,
           372,  1122,   451,   367,  5181,   304,   748,   304,  4307,  3762,
         29889,  2860,   366, 10591,   403,   515,  4307,  3762, 29892,   366,
           674, 12234,   505,   304,  1209,   263,  7794,   575,   292,  4392,
         29889,    13,    13, 29950,  7889, 29901,  1128,  1784,  2440,   947,
           263,  3619,  4307,  7426,  2125,   304,   679, 29973,  1126,   338,
           372, 15574,   763,   263,  5835, 29915, 29879,  1824, 29892,   925,
           263,  2846,  2440, 29892,   470,   763,   385,  5684,  2989,  7426,
         29973,    13,    13,  7900, 22137, 29901,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0]], device='cuda:2')

It is still possible that the answer is an empty string, so errors occur.

Maybe the issue lies with the .generate() method of transformers, and we have to wait for them to fix it by now?

GasolSun36 commented 1 year ago

hi， I haved solved the issue, see https://github.com/microsoft/DeepSpeed/issues/4229#issuecomment-1704004959

microsoft / DeepSpeed

IndexError: argmax(): Expected reduction dim 1 to have non-zero size. [BUG] #4226

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible