@danielhanchen Hi, could you please give some advice for this issue? DPO training failed with Deepspeed Zero3 offload.
pip install "unsloth[cu121-ampere-torch211] @ git+https://github.com/unslothai/unsloth.git"
torch 2.1.1+cu121
unsloth 2024.1
Driver Version: 535.129.03 CUDA Version: 12.2
Traceback (most recent call last):
File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/train_bash.py", line 14, in <module>
main()
File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/train/tuner.py", line 38, in run_exp
run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/train/dpo/workflow.py", line 30, in run_dpo
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
File "/workspace/llm_tuning/DPO/LLaMA-Factory/src/llmtuner/model/loader.py", line 83, in load_model
model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/loader.py", line 79, in from_pretrained
return dispatch_model.from_pretrained(
File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/llama.py", line 689, in from_pretrained
model = FastLlamaModel.post_patch(model)
File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/unsloth/models/llama.py", line 738, in post_patch
model.model.embed_tokens = torch.nn.Embedding.from_pretrained(model.model.embed_tokens.weight)
File "/miniconda3/envs/llm_factory_unsloth_tf437/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 210, in from_pretrained
assert embeddings.dim() == 2, \
AssertionError: Embeddings parameter is expected to be 2-dimensional
@danielhanchen Hi, could you please give some advice for this issue? DPO training failed with Deepspeed Zero3 offload.