Open a1exyu opened 4 months ago
Yes, same here, though in my case I tried it with internlm2-20b
(base, non-chat)
The same configuration, but applied to internlm2-7b
, appears to work (I did not allow it to conclude as I am not interested in that model)
telechat-12b the same error
09/25/2024 07:48:54 - WARNING - llamafactory.model.model_utils.checkpointing - You are using the old GC format, some features (e.g. BAdam) will be invalid.
09/25/2024 07:48:54 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/25/2024 07:48:54 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
09/25/2024 07:48:54 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
09/25/2024 07:48:54 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
09/25/2024 07:48:54 - INFO - llamafactory.model.model_utils.misc - Found linear modules: key_value,down_proj,dense,query,gate_proj,up_proj
09/25/2024 07:48:56 - INFO - llamafactory.model.loader - trainable params: 18677760 || all params: 7218696192 || trainable%: 0.2587
[INFO|trainer.py:648] 2024-09-25 07:48:56,887 >> Using auto half precision backend
[INFO|trainer.py:2134] 2024-09-25 07:48:57,289 >> Running training
[INFO|trainer.py:2135] 2024-09-25 07:48:57,290 >> Num examples = 4,999
[INFO|trainer.py:2136] 2024-09-25 07:48:57,290 >> Num Epochs = 1
[INFO|trainer.py:2137] 2024-09-25 07:48:57,290 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-09-25 07:48:57,290 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-09-25 07:48:57,290 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-09-25 07:48:57,290 >> Total optimization steps = 624
[INFO|trainer.py:2143] 2024-09-25 07:48:57,293 >> Number of trainable parameters = 18,677,760
0%| | 0/624 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/llamafactory-cli", line 8, in
Reminder
Reproduction
CUDA_VISIBLE_DEVICES=1 llamafactory-cli example/...... below is the yaml file:
model
model_name_or_path: /home/ybh/ybh/models/internlm2-chat-20b quantization_bit: 4
method
stage: sft do_train: true finetuning_type: lora lora_target: wqkv
dataset
dataset: text_classification_coarse template: intern2 cutoff_len: 6144 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: /home/ybh/ybh/nlpcc/LLaMA-Factory/saves/internlm2-chat-20b/qlora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 0.0001 num_train_epochs: 5.0 lr_scheduler_type: cosine warmup_steps: 0.1 fp16: true
eval
val_size: 0.1 per_device_eval_batch_size: 1 evaluation_strategy: steps eval_steps: 10
Expected behavior
No response
System Info
[INFO|trainer.py:2048] 2024-05-18 00:07:10,006 >> Running training [INFO|trainer.py:2049] 2024-05-18 00:07:10,006 >> Num examples = 122 [INFO|trainer.py:2050] 2024-05-18 00:07:10,006 >> Num Epochs = 5 [INFO|trainer.py:2051] 2024-05-18 00:07:10,006 >> Instantaneous batch size per device = 1 [INFO|trainer.py:2054] 2024-05-18 00:07:10,006 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:2055] 2024-05-18 00:07:10,006 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2056] 2024-05-18 00:07:10,006 >> Total optimization steps = 75 [INFO|trainer.py:2057] 2024-05-18 00:07:10,007 >> Number of trainable parameters = 2,621,440 0%| | 0/75 [00:00<?, ?it/s]/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants. warnings.warn( /home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Traceback (most recent call last): File "/home/ybh/miniconda3/envs/nlpcc/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/cli.py", line 65, in main
run_exp()
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data/ybh/nlpcc/LLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 73, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/accelerate/accelerator.py", line 2121, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/home/ybh/miniconda3/envs/nlpcc/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
0%| | 0/75 [00:00<?, ?it/s
Others
i changed lora to finetune internlm-chat-7b, but this error is not happend.