huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.14k stars 1.59k forks source link

Error when applyling the fp 16 training option using accelerate #119

Closed codingchild2424 closed 1 year ago

codingchild2424 commented 1 year ago

Hi. PEFT is amazing. Thank you for sharing this amazing package for us. However, when I used fp 16 training option using accelerate deepspeed ZeRO 3 with PEFT LoRA, error occured. How can I handle this problem?

[My Setting]

[Error logs]

Traceback (most recent call last): File "run_clm_no_hf_trainer.py", line 492, in main() File "run_clm_no_hf_trainer.py", line 418, in main model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare( File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 943, in prepare result = self._prepare_deepspeed(*args) File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1173, in _preparedeepspeed engine, optimizer, , lr_scheduler = deepspeed.initialize(**kwargs) File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 125, in initialize engine = DeepSpeedEngine(args=args, File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 297, in init self._configure_distributed_model(model) File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1133, in _configure_distributed_model raise ValueError( ValueError: fp16 is enabled but the following parameters have dtype that is not fp16: base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight, base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196869 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196872 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 196875 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 196873) of binary: /usr/bin/python3.8

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.