hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.07k stars 4.07k forks source link

yuan2报错 #3327

Closed bgtii closed 6 months ago

bgtii commented 6 months ago

Reminder

Reproduction

04/18/2024 02:03:19 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16 [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2085] 2024-04-18 02:03:19,779 >> loading file tokenizer.json [WARNING|logging.py:329] 2024-04-18 02:03:19,780 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 04/18/2024 02:03:20 - INFO - llmtuner.data.template - Replace eos token: 04/18/2024 02:03:20 - INFO - llmtuner.data.template - Add pad token: 04/18/2024 02:03:20 - INFO - llmtuner.data.template - Cannot add this chat template to tokenizer. 04/18/2024 02:03:20 - INFO - llmtuner.data.loader - Loading dataset dataSet_1713405777659.json... 04/18/2024 02:03:20 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.

Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 9 examples [00:00, 1495.24 examples/s]

Converting format of dataset: 0%| | 0/9 [00:00<?, ? examples/s] Converting format of dataset: 100%|██████████| 9/9 [00:00<00:00, 1001.72 examples/s]

Running tokenizer on dataset: 0%| | 0/9 [00:00<?, ? examples/s] Running tokenizer on dataset: 100%|██████████| 9/9 [00:00<00:00, 968.59 examples/s] [INFO|configuration_utils.py:724] 2024-04-18 02:03:21,472 >> loading configuration file ./source/input/models/yuan_1780778941796061184/config.json [INFO|configuration_utils.py:724] 2024-04-18 02:03:21,476 >> loading configuration file ./source/input/models/yuan_1780778941796061184/config.json [INFO|configuration_utils.py:789] 2024-04-18 02:03:21,478 >> Model config YuanConfig { "_from_model_config": true, "_name_or_path": "./source/input/models/yuan_1780778941796061184", "architectures": [ "YuanForCausalLM" ], "auto_map": { "AutoConfig": "configuration_yuan.YuanConfig", "AutoModelForCausalLM": "yuan_hf_model.YuanForCausalLM" }, "bos_token_id": 77185, "causal_mask": true, "dropout": 0.1, "eod_token": 77185, "eod_token_id": 77185, "eos_token_id": 77185, "hidden_act": "silu", "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 8192, "mask_token_id": 77185, "max_position_embeddings": 8192, "model_max_length": 8192, "model_type": "yuan", "num_attention_heads": 32, "num_hidden_layers": 24, "pad_token_id": 77185, "reset_attention_mask": true, "reset_position_ids": true, "rms_norm_eps": 1e-06, "sep_token": 77187, "sep_token_id": 77185, "tokenizer_class": "YuanTokenizer", "torch_dtype": "bfloat16", "transformers_version": "4.40.0.dev0", "use_cache": true, "use_flash_attention": true, "use_loss_mask": false, "vocab_size": 135040 }

[INFO|modeling_utils.py:3426] 2024-04-18 02:03:21,528 >> loading weights file ./source/input/models/yuan_1780778941796061184/pytorch_model.bin [INFO|modeling_utils.py:1494] 2024-04-18 02:03:21,583 >> Instantiating YuanForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:928] 2024-04-18 02:03:21,585 >> Generate config GenerationConfig { "bos_token_id": 77185, "eos_token_id": 77185, "pad_token_id": 77185 }

[INFO|modeling_utils.py:4170] 2024-04-18 02:03:23,805 >> All model checkpoint weights were used when initializing YuanForCausalLM.

[INFO|modeling_utils.py:4178] 2024-04-18 02:03:23,805 >> All the weights of YuanForCausalLM were initialized from the model checkpoint at ./source/input/models/yuan_1780778941796061184. If your task is similar to the task the model of the checkpoint was trained on, you can already use YuanForCausalLM for predictions without further training. [INFO|configuration_utils.py:881] 2024-04-18 02:03:23,809 >> loading configuration file ./source/input/models/yuan_1780778941796061184/generation_config.json [INFO|configuration_utils.py:928] 2024-04-18 02:03:23,810 >> Generate config GenerationConfig { "bos_token_id": 77185, "eos_token_id": 77185, "pad_token_id": 77185 }

[WARNING|modeling_utils.py:2228] 2024-04-18 02:03:23,842 >> You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. input_ids: [29871, 36907, 40346, 51409, 30882, 77187, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185] inputs: 你能为我做些什么? 我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。 label_ids: [-100, -100, -100, -100, -100, -100, 29871, 36857, 32587, 32162, 32927, 32047, 30330, 32137, 32503, 30330, 32019, 36288, 30330, 35669, 34359, 33096, 30267, 77185] labels: 我可以帮您回答问题、提供建议、进行聊天、翻译文字等等。 04/18/2024 02:03:23 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 04/18/2024 02:03:23 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 04/18/2024 02:03:23 - INFO - llmtuner.model.loader - trainable params: 1572864 || all params: 2090297344 || trainable%: 0.0752 [INFO|trainer.py:625] 2024-04-18 02:03:23,968 >> Using auto half precision backend [INFO|trainer.py:2047] 2024-04-18 02:03:24,137 >> Running training [INFO|trainer.py:2048] 2024-04-18 02:03:24,137 >> Num examples = 8 [INFO|trainer.py:2049] 2024-04-18 02:03:24,137 >> Num Epochs = 10 [INFO|trainer.py:2050] 2024-04-18 02:03:24,137 >> Instantaneous batch size per device = 4 [INFO|trainer.py:2053] 2024-04-18 02:03:24,137 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2054] 2024-04-18 02:03:24,137 >> Gradient Accumulation steps = 4 [INFO|trainer.py:2055] 2024-04-18 02:03:24,137 >> Total optimization steps = 10 [INFO|trainer.py:2056] 2024-04-18 02:03:24,140 >> Number of trainable parameters = 1,572,864 /usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( Traceback (most recent call last): File "/workspace/src/train_bash.py", line 14, in main() File "/workspace/src/train_bash.py", line 5, in main run_exp() File "/workspace/src/llmtuner/train/tuner.py", line 33, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/workspace/src/llmtuner/train/sft/workflow.py", line 71, in run_sft train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1858, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2202, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3137, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3160, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 825, in forward return model_forward(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 813, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 984, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 806, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 482, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 261, in forward outputs = run_function(args) File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 802, in custom_forward return module(inputs, output_attentions, None) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 476, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_1780778941796061184/yuan_hf_model.py", line 394, in forward output = flash_attn_unpadded_func( NameError: name 'flash_attn_unpadded_func' is not defined

Expected behavior

试了很多版本的transformers都报错,包括镜像的,hugging face上的,各种版本都不行 请问这个怎么解决

System Info

No response

Others

No response

hiyouga commented 6 months ago

把这行设置为 false https://huggingface.co/IEITYuan/Yuan2-2B-hf/blob/main/config.json#L26

bgtii commented 6 months ago

把这行设置为 false https://huggingface.co/IEITYuan/Yuan2-2B-hf/blob/main/config.json#L26

谢谢,用这个方法解决了