Closed cokuehuang closed 1 year ago
same error
same error
Same error. Modifying the ds_attention.py
brings NoImplementationError
.
similar but not same error。
File "main.py", line 552, in <module> main() File "main.py", line 458, in main 192.18.75.0: out = trainer.generate_experience(prompts) 192.18.75.0: File "/baichuan/haoyu/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 203, in generate_experience 192.18.75.0: seq = self._generate_sequence(prompts) 192.18.75.0: File "/baichuan/haoyu/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 161, in _generate_sequence 192.18.75.0: seq = self.actor_model.module.generate(prompts, 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context 192.18.75.0: return func(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/generation/utils.py", line 1513, in generate 192.18.75.0: return self.greedy_search( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/generation/utils.py", line 2330, in greedy_search 192.18.75.0: outputs = self( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 913, in forward 192.18.75.0: transformer_outputs = self.transformer( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 730, in forward 192.18.75.0: inputs_embeds = self.word_embeddings(input_ids) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl 192.18.75.0: result = forward_call(*args, **kwargs) 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward 192.18.75.0: return F.embedding( 192.18.75.0: File "/baichuan/anaconda3/envs/deepspeedchat/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding 192.18.75.0: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) 192.18.75.0: RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
what should i do to fix this error?
Any update to this issue?
same error for actor model :bloomz-7b1 and reward model :opt1.3b
Same error. Modifying the
ds_attention.py
bringsNoImplementationError
.
NoImplementationError is caused by softmaxfunction when config.fp16 is False. Perhaps you've modified fp16 to bf16 that in ds_utils.py according to some issue(same as me). To solve this problem: Change File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention attn_mask=((1 - input_mask).half() minus_inf), Into attn_mask=((1-input_mask.int()).half() minus_inf), will work for me
Same error. Modifying the
ds_attention.py
bringsNoImplementationError
.NoImplementationError is caused by softmaxfunction when config.fp16 is False. Perhaps you've modified fp16 to bf16 that in ds_utils.py according to some issue(same as me). To solve this problem: Change File "/opt/conda/envs/deepspeedchat/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention attn_mask=((1 - input_mask).half() minus_inf), Into attn_mask=((1-input_mask.int()).half() minus_inf), will work for me
Not working at all. The padding_side for opt is right, while for bloomz it is left. I tried passing in two different tokenizers, but it caused a lot of conflicts when making the experience.
Similar issue on DeepSpeed side: https://github.com/microsoft/DeepSpeed/issues/3518
Same error with actor model bloom560m, and critic model opt-350m. Any update?
Hi @cokuehuang,
Can you please try running this again and include the following PR as well:
I've been able to get this running with the bigscience/bloomz-1b7
BLOOM model:
DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/bloom/single_node/run_bloom.sh bigscience/bloomz-1b7 ../step2_reward_model_finetuning/bloom_7b_output/ 3 3 output_bloom7b_actor_hf_critic_step2
Thanks, Lev
Hi @cokuehuang,
Closing the issue for now since solution was provided. If any issues are still encountered, feel free to open another issue.
Actor model: Bloom-1.1b Reward model: Bloom-560m Finetuning cmd: bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/bloom-1.1b/ /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/reward_model/bloom-560m Part of training log:
Howerve, change model to opt works well.