[X] I have read the README and searched the existing issues.
Reproduction
04/18/2024 02:03:19 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.json
[WARNING|logging.py:329] 2024-04-18 02:03:19,780 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
04/18/2024 02:03:20 - INFO - llmtuner.data.template - Replace eos token:
04/18/2024 02:03:20 - INFO - llmtuner.data.template - Add pad token:
04/18/2024 02:03:20 - INFO - llmtuner.data.template - Cannot add this chat template to tokenizer.
04/18/2024 02:03:20 - INFO - llmtuner.data.loader - Loading dataset dataSet_1713405777659.json...
04/18/2024 02:03:20 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
[INFO|modeling_utils.py:4170] 2024-04-18 02:03:23,805 >> All model checkpoint weights were used when initializing YuanForCausalLM.
[INFO|modeling_utils.py:4178] 2024-04-18 02:03:23,805 >> All the weights of YuanForCausalLM were initialized from the model checkpoint at ./source/input/models/yuan_1780778941796061184.
If your task is similar to the task the model of the checkpoint was trained on, you can already use YuanForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-04-18 02:03:23,809 >> loading configuration file ./source/input/models/yuan_1780778941796061184/generation_config.json
[INFO|configuration_utils.py:928] 2024-04-18 02:03:23,810 >> Generate config GenerationConfig {
"bos_token_id": 77185,
"eos_token_id": 77185,
"pad_token_id": 77185
}
[WARNING|modeling_utils.py:2228] 2024-04-18 02:03:23,842 >> You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model.
input_ids:
[29871, 36907, 40346, 51409, 30882, 77187, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185]
inputs:
你能为我做些什么? 我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。
label_ids:
[-100, -100, -100, -100, -100, -100, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185]
labels:
我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。
04/18/2024 02:03:23 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
04/18/2024 02:03:23 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
04/18/2024 02:03:23 - INFO - llmtuner.model.loader - trainable params: 1572864 || all params: 2090297344 || trainable%: 0.0752
[INFO|trainer.py:625] 2024-04-18 02:03:23,968 >> Using auto half precision backend
[INFO|trainer.py:2047] 2024-04-18 02:03:24,137 >> Running training
[INFO|trainer.py:2048] 2024-04-18 02:03:24,137 >> Num examples = 8
[INFO|trainer.py:2049] 2024-04-18 02:03:24,137 >> Num Epochs = 10
[INFO|trainer.py:2050] 2024-04-18 02:03:24,137 >> Instantaneous batch size per device = 4
[INFO|trainer.py:2053] 2024-04-18 02:03:24,137 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2054] 2024-04-18 02:03:24,137 >> Gradient Accumulation steps = 4
[INFO|trainer.py:2055] 2024-04-18 02:03:24,137 >> Total optimization steps = 10
[INFO|trainer.py:2056] 2024-04-18 02:03:24,140 >> Number of trainable parameters = 1,572,864
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/workspace/src/train_bash.py", line 14, in
main()
File "/workspace/src/train_bash.py", line 5, in main
run_exp()
File "/workspace/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/workspace/src/llmtuner/train/sft/workflow.py", line 71, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1858, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2202, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3137, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3160, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 813, in call
return convert_to_fp32(self.model_forward(*args, *kwargs))
File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 984, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 806, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 482, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 261, in forward
outputs = run_function(args)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 802, in custom_forward
return module(inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 476, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 394, in forward
output = flash_attn_unpadded_func(
NameError: name 'flash_attn_unpadded_func' is not defined
Reminder
Reproduction
04/18/2024 02:03:19 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16 [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.json [WARNING|logging.py:329] 2024-04-18 02:03:19,780 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the
04/18/2024 02:03:20 - INFO - llmtuner.data.template - Add pad token:
04/18/2024 02:03:20 - INFO - llmtuner.data.template - Cannot add this chat template to tokenizer.
04/18/2024 02:03:20 - INFO - llmtuner.data.loader - Loading dataset dataSet_1713405777659.json...
04/18/2024 02:03:20 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 04/18/2024 02:03:20 - INFO - llmtuner.data.template - Replace eos token:Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 9 examples [00:00, 1495.24 examples/s]
Converting format of dataset: 0%| | 0/9 [00:00<?, ? examples/s] Converting format of dataset: 100%|██████████| 9/9 [00:00<00:00, 1001.72 examples/s]
Running tokenizer on dataset: 0%| | 0/9 [00:00<?, ? examples/s] Running tokenizer on dataset: 100%|██████████| 9/9 [00:00<00:00, 968.59 examples/s] [INFO|configuration_utils.py:724] 2024-04-18 02:03:21,472 >> loading configuration file ./source/input/models/yuan_1780778941796061184/config.json [INFO|configuration_utils.py:724] 2024-04-18 02:03:21,476 >> loading configuration file ./source/input/models/yuan_1780778941796061184/config.json [INFO|configuration_utils.py:789] 2024-04-18 02:03:21,478 >> Model config YuanConfig { "_from_model_config": true, "_name_or_path": "./source/input/models/yuan_1780778941796061184", "architectures": [ "YuanForCausalLM" ], "auto_map": { "AutoConfig": "configuration_yuan.YuanConfig", "AutoModelForCausalLM": "yuan_hf_model.YuanForCausalLM" }, "bos_token_id": 77185, "causal_mask": true, "dropout": 0.1, "eod_token": 77185, "eod_token_id": 77185, "eos_token_id": 77185, "hidden_act": "silu", "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 8192, "mask_token_id": 77185, "max_position_embeddings": 8192, "model_max_length": 8192, "model_type": "yuan", "num_attention_heads": 32, "num_hidden_layers": 24, "pad_token_id": 77185, "reset_attention_mask": true, "reset_position_ids": true, "rms_norm_eps": 1e-06, "sep_token": 77187, "sep_token_id": 77185, "tokenizer_class": "YuanTokenizer", "torch_dtype": "bfloat16", "transformers_version": "4.40.0.dev0", "use_cache": true, "use_flash_attention": true, "use_loss_mask": false, "vocab_size": 135040 }
[INFO|modeling_utils.py:3426] 2024-04-18 02:03:21,528 >> loading weights file ./source/input/models/yuan_1780778941796061184/pytorch_model.bin [INFO|modeling_utils.py:1494] 2024-04-18 02:03:21,583 >> Instantiating YuanForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:928] 2024-04-18 02:03:21,585 >> Generate config GenerationConfig { "bos_token_id": 77185, "eos_token_id": 77185, "pad_token_id": 77185 }
[INFO|modeling_utils.py:4170] 2024-04-18 02:03:23,805 >> All model checkpoint weights were used when initializing YuanForCausalLM.
[INFO|modeling_utils.py:4178] 2024-04-18 02:03:23,805 >> All the weights of YuanForCausalLM were initialized from the model checkpoint at ./source/input/models/yuan_1780778941796061184. If your task is similar to the task the model of the checkpoint was trained on, you can already use YuanForCausalLM for predictions without further training. [INFO|configuration_utils.py:881] 2024-04-18 02:03:23,809 >> loading configuration file ./source/input/models/yuan_1780778941796061184/generation_config.json [INFO|configuration_utils.py:928] 2024-04-18 02:03:23,810 >> Generate config GenerationConfig { "bos_token_id": 77185, "eos_token_id": 77185, "pad_token_id": 77185 }
[WARNING|modeling_utils.py:2228] 2024-04-18 02:03:23,842 >> You are using an old version of the checkpointing format that is deprecated (We will also silently ignore 我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。
label_ids:
[-100, -100, -100, -100, -100, -100, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185]
labels:
我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。
04/18/2024 02:03:23 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
04/18/2024 02:03:23 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
04/18/2024 02:03:23 - INFO - llmtuner.model.loader - trainable params: 1572864 || all params: 2090297344 || trainable%: 0.0752
[INFO|trainer.py:625] 2024-04-18 02:03:23,968 >> Using auto half precision backend
[INFO|trainer.py:2047] 2024-04-18 02:03:24,137 >> Running training
[INFO|trainer.py:2048] 2024-04-18 02:03:24,137 >> Num examples = 8
[INFO|trainer.py:2049] 2024-04-18 02:03:24,137 >> Num Epochs = 10
[INFO|trainer.py:2050] 2024-04-18 02:03:24,137 >> Instantaneous batch size per device = 4
[INFO|trainer.py:2053] 2024-04-18 02:03:24,137 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2054] 2024-04-18 02:03:24,137 >> Gradient Accumulation steps = 4
[INFO|trainer.py:2055] 2024-04-18 02:03:24,137 >> Total optimization steps = 10
[INFO|trainer.py:2056] 2024-04-18 02:03:24,140 >> Number of trainable parameters = 1,572,864
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/workspace/src/train_bash.py", line 14, in
main()
File "/workspace/src/train_bash.py", line 5, in main
run_exp()
File "/workspace/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/workspace/src/llmtuner/train/sft/workflow.py", line 71, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1858, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2202, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3137, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3160, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 813, in call
return convert_to_fp32(self.model_forward(*args, *kwargs))
File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 984, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 806, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 482, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 261, in forward
outputs = run_function(args)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 802, in custom_forward
return module(inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 476, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 394, in forward
output = flash_attn_unpadded_func(
NameError: name 'flash_attn_unpadded_func' is not defined
gradient_checkpointing_kwargs
in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method_set_gradient_checkpointing
in your model. input_ids: [29871, 36907, 40346, 51409, 30882, 77187, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185] inputs: 你能为我做些什么?Expected behavior
试了很多版本的transformers都报错,包括镜像的,hugging face上的,各种版本都不行 请问这个怎么解决
System Info
No response
Others
No response