hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.31k stars 3.96k forks source link

AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'get_input_embeddings' #1831

Closed TonyzBi closed 10 months ago

TonyzBi commented 10 months ago

Reminder

Reproduction

python src/train_bash.py --stage rm --model_name_or_path /home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/ --do_train True --finetuning_type lora --quantization_bit 8 --template chatglm3 --flash_attn False --shift_attn False --dataset_dir data --dataset comparison_gpt4_en --cutoff_len 1024 --learning_rate 5e-05 --num_train_epochs 20.0 --max_samples 100000 --per_device_train_batch_size 3 --gradient_accumulation_steps 3 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --neftune_noise_alpha 0 --train_on_prompt False --upcast_layernorm True --lora_rank 8 --lora_dropout 0.1 --lora_target query_key_value --resume_lora_training True --output_dir /home/bihai/datas/LLM/train_2023-12-13-16-37-53 --fp16 True --plot_loss True

Expected behavior

No response

System Info

AttributeError: 'AutoModelForCausalLMWithValueHead' object has no attribute 'get_input_embeddings'

Others

the log shows that the weight has been loaded successfully as following: 12/13/2023 17:15:04 - INFO - llmtuner.data.loader - Loading dataset comparison_gpt4_data_en.json... Using custom data configuration default-9a44b34ac295f56e Loading Dataset Infos from /home/bihai/anaconda3/envs/llm-fine-tune/lib/python3.10/site-packages/datasets/packaged_modules/json Overwrite dataset info from restored data version if exists. Loading Dataset info from /home/bihai/.cache/huggingface/datasets/json/default-9a44b34ac295f56e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96 Found cached dataset json (/home/bihai/.cache/huggingface/datasets/json/default-9a44b34ac295f56e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) Loading Dataset info from /home/bihai/.cache/huggingface/datasets/json/default-9a44b34ac295f56e/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96 [INFO|tokenization_utils_base.py:2024] 2023-12-13 17:15:05,756 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2024] 2023-12-13 17:15:05,756 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2024] 2023-12-13 17:15:05,756 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2024] 2023-12-13 17:15:05,756 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2024] 2023-12-13 17:15:05,756 >> loading file tokenizer.json [INFO|configuration_utils.py:737] 2023-12-13 17:15:05,861 >> loading configuration file /home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/config.json [INFO|configuration_utils.py:737] 2023-12-13 17:15:05,862 >> loading configuration file /home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/config.json [INFO|configuration_utils.py:802] 2023-12-13 17:15:05,862 >> Model config ChatGLMConfig { "_name_or_path": "/home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/", "add_bias_linear": false, "add_qkv_bias": true, "apply_query_key_layer_scaling": true, "apply_residual_connection_post_layernorm": false, "architectures": [ "ChatGLMModel" ], "attention_dropout": 0.0, "attention_softmax_in_fp32": true, "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification" }, "bias_dropout_fusion": true, "classifier_dropout": null, "eos_token_id": 2, "ffn_hidden_size": 13696, "fp32_residual_connection": false, "hidden_dropout": 0.0, "hidden_size": 4096, "kv_channels": 128, "layernorm_epsilon": 1e-05, "model_type": "chatglm", "multi_query_attention": true, "multi_query_group_num": 2, "num_attention_heads": 32, "num_layers": 28, "original_rope": true, "pad_token_id": 0, "padded_vocab_size": 65024, "post_layer_norm": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 0, "rmsnorm": true, "seq_length": 8192, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.36.0", "use_cache": true, "vocab_size": 65024 }

12/13/2023 17:15:05 - INFO - llmtuner.model.loader - Quantizing model to 8 bit. [INFO|modeling_utils.py:3329] 2023-12-13 17:15:05,893 >> loading weights file /home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/pytorch_model.bin.index.json [INFO|modeling_utils.py:1341] 2023-12-13 17:15:05,893 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.float16. [INFO|configuration_utils.py:826] 2023-12-13 17:15:05,894 >> Generate config GenerationConfig { "eos_token_id": 2, "pad_token_id": 0 }

[INFO|modeling_utils.py:3469] 2023-12-13 17:15:06,342 >> Detected 8-bit loading: activating 8-bit loading for this model Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:12<00:00, 1.72s/it] [INFO|modeling_utils.py:4173] 2023-12-13 17:15:20,012 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:4181] 2023-12-13 17:15:20,012 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /home/bihai/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/. If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training. [INFO|modeling_utils.py:3739] 2023-12-13 17:15:20,016 >> Generation config file not found, using a generation config created from the model config. 12/13/2023 17:15:20 - WARNING - llmtuner.model.utils - Current model does not support resizing token embeddings. 12/13/2023 17:15:20 - INFO - llmtuner.model.utils - Upcasting weights in layernorm in float32.

hiyouga commented 10 months ago

Please use the issue template to report issues

lianzhaoy commented 8 months ago

So, how is this problem solved? Is there a specific solution?