Hi. PEFT is amazing. Thank you for sharing this amazing package for us.
However, when I used fp 16 training option using accelerate deepspeed ZeRO 3 with PEFT LoRA, error occured.
How can I handle this problem?
[My Setting]
used accelerate with deepspeed (ZeRO 3)
used PEFT (LoRA)
used Polyglot (GPT-Neox architecture)
tried to use fp 16
4 GPU ( RTX 3090 )
[Error logs]
Traceback (most recent call last):
File "run_clm_no_hf_trainer.py", line 492, in
main()
File "run_clm_no_hf_trainer.py", line 418, in main
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 943, in prepare
result = self._prepare_deepspeed(*args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1173, in _preparedeepspeed
engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 297, in init
self._configure_distributed_model(model)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1133, in _configure_distributed_model
raise ValueError(
ValueError: fp16 is enabled but the following parameters have dtype that is not fp16: base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196869 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196872 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196875 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 196873) of binary: /usr/bin/python3.8
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Hi. PEFT is amazing. Thank you for sharing this amazing package for us. However, when I used fp 16 training option using accelerate deepspeed ZeRO 3 with PEFT LoRA, error occured. How can I handle this problem?
[My Setting]
[Error logs]
Traceback (most recent call last): File "run_clm_no_hf_trainer.py", line 492, in
main()
File "run_clm_no_hf_trainer.py", line 418, in main
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 943, in prepare
result = self._prepare_deepspeed(*args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1173, in _preparedeepspeed
engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 297, in init
self._configure_distributed_model(model)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1133, in _configure_distributed_model
raise ValueError(
ValueError: fp16 is enabled but the following parameters have dtype that is not fp16: base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196869 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196872 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196875 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 196873) of binary: /usr/bin/python3.8