[INFO|trainer.py:2078] 2024-06-26 18:20:43,472 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-06-26 18:20:43,472 >> Num examples = 99
[INFO|trainer.py:2080] 2024-06-26 18:20:43,472 >> Num Epochs = 16
[INFO|trainer.py:2081] 2024-06-26 18:20:43,472 >> Instantaneous batch size per device = 4
[INFO|trainer.py:2084] 2024-06-26 18:20:43,473 >> Total train batch size (w. parallel, distributed & accumulation) = 4
[INFO|trainer.py:2085] 2024-06-26 18:20:43,473 >> Gradient Accumulation steps = 1
[INFO|trainer.py:2086] 2024-06-26 18:20:43,473 >> Total optimization steps = 400
[INFO|trainer.py:2087] 2024-06-26 18:20:43,476 >> Number of trainable parameters = 20,971,520
0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/work/workspace/zyw/.conda/envs/rag/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/work/workspace/zyw/project/emr/llama3/LLaMA-Factory-for-update/LLaMA-Factory/src/llamafactory/cli.py", line 110, in main
run_exp()
File "/home/work/workspace/zyw/project/emr/llama3/LLaMA-Factory-for-update/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/work/workspace/zyw/project/emr/llama3/LLaMA-Factory-for-update/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 88, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/trainer.py", line 3238, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/trainer.py", line 3264, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/peft/peft_model.py", line 1430, in forward
return self.base_model(
^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
return self.model.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 925, in forward
inputs_embeds = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
^^^^^^^^^^^^
File "/home/work/workspace/zyw/.conda/envs/rag/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: 'weight' must be 2-D
0%|
Expected behavior
Using a zero0-based configuration, llama3 can be finetuned successfully. But after switching to zero3-based configuration, it does not work. No changes are made to deepspeed-zero*-related config files.
Reminder
System Info
llamafactory
version: 0.8.3.dev0Reproduction
运行参数:
配置文件
llama3_lora_sft_ds3_emr.yaml
:ds_z3_config.json
:stacktrace:
Expected behavior
Using a zero0-based configuration, llama3 can be finetuned successfully. But after switching to zero3-based configuration, it does not work. No changes are made to deepspeed-zero*-related config files.
Others
No response