hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.08k stars 3.93k forks source link

Cannot use adamw_8bit and paged_adamw_8bit with qlora+fsdp #5619

Open mces89 opened 2 days ago

mces89 commented 2 days ago

Reminder

System Info

latest llamafactory version

Reproduction

I'm using the latest llamafactory version to run sft(qlora+fsdp) for llama3.1 70B with 8xA100. It works with default optim, but with 8bit adam, i get the following error:

rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train rank0: return inner_training_loop( rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2341, in _inner_training_loop

rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/accelerate/optimizer.py", line 172, in step

rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 130, in wrapper rank0: return func.get(opt, opt.class)(*args, kwargs) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/optim/optimizer.py", line 484, in wrapper rank0: out = func(*args, *kwargs) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context rank0: return func(args, kwargs) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 287, in step rank0: self.update_step(group, p, gindex, pindex) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context rank0: return func(*args, **kwargs) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 546, in update_step

rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1774, in optimizer_update_8bit_blockwise rank0: prev_device = pre_call(g.device) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/bitsandbytes/functional.py", line 463, in pre_call

rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/cuda/init.py", line 418, in set_device rank0: device = _get_device_index(device) rank0: File "/opt/conda/envs/ptca/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index rank0: raise ValueError(f"Expected a cuda device, but got: {device}") rank0: ValueError: Expected a cuda device, but got: cpu

Expected behavior

No response

Others

No response

mces89 commented 14 hours ago

is it because i use use_unsloth_gc=True?