ssbuild / chatglm_finetuning

chatglm 6b finetuning and alpaca finetuning
1.54k stars 176 forks source link

设置 LoRa微调的 'target_modules' 后,运行报错 "AssertionError" #213

Closed ngbruce closed 1 year ago

ngbruce commented 1 year ago

设置情况如下: 在 config.json 中,设置 "num_layers":8 ,其余不变,因为P40单卡,如果设置为28,加载时会爆显存。 在 data_utils.py 中,设置
'target_modules':['query_key_value',"dense","dense_h_to_4h","dense_4h_to_h"] 原本只有 'query_key_value' ,参考其他项目加多了后面3个。 在运行时,加载基本模型后,出现报错: "AssertionError" 请问是什么原因?谢谢。

ssbuild commented 1 year ago

上传下错误信息

ngbruce commented 1 year ago

执行train.py后输出信息如下: ` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary C:\Users\Bruce\anaconda3\envs\new3109finetune\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... INFO:lightning_fabric.utilities.seed:Global seed set to 42 INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs ChatGLMConfig { "architectures": [ "ChatGLMModel" ], "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration" }, "bos_token_id": 130004, "eos_token_id": 130005, "gmask_token_id": 130001, "hidden_size": 4096, "initializer_range": 0.02, "initializer_weight": false, "inner_hidden_size": 16384, "layernorm_epsilon": 1e-05, "mask_token_id": 130000, "max_sequence_length": 2048, "model_type": "chatglm", "num_attention_heads": 32, "num_layers": 8, "pad_token_id": 3, "position_encoding_2d": true, "pre_seq_len": null, "precision": 16, "prefix_projection": false, "quantization_bit": 0, "return_dict": false, "task_specific_params": { "learning_rate": 2e-05, "learning_rate_for_task": 2e-05 }, "torch_dtype": "float16", "transformers_version": "4.27.1", "use_cache": true, "vocab_size": 130528 }

TrainingArguments(optimizer='lion', scheduler_type='CAWR', scheduler={'T_mult': 1, 'rewarm_epoch_num': 0.5, 'verbose': False}, adv=None, hierarchical_position=None, learning_rate=2e-05, learning_rate_for_task=2e-05, max_epochs=20, max_steps=-1, optimizer_betas=(0.9, 0.999), adam_epsilon=1e-08, gradient_accumulation_steps=1, max_grad_norm=1.0, weight_decay=0, warmup_steps=0, train_batch_size=4, eval_batch_size=2, test_batch_size=2, seed=42) ModelArguments(model_name_or_path='..\ChatGLM-6B\THUDM\chatglm-6b', model_type='chatglm', config_overrides=None, config_name='./config/config_my.json', tokenizer_name='..\ChatGLM-6B\THUDM\chatglm-6b', cache_dir=None, do_lower_case=False, use_fast_tokenizer=False, model_revision='main', use_auth_token=False) INFO:root:make_dataset ./data/finetune_train_examples.json train... INFO:root:make data ./output\dataset_file_0_dupe_factor_0-train.record... Loading checkpoint shards: 100%|██████████| 8/8 [00:37<00:00, 4.68s/it] Some weights of the model checkpoint at ..\ChatGLM-6B\THUDM\chatglm-6b were not used when initializing MyChatGLMForConditionalGeneration: ['transformer.layers.15.mlp.dense_h_to_4h.bias', 'transformer.layers.15.attention.query_key_value.bias', 'transformer.layers.24.attention.dense.weight', 'transformer.layers.12.input_layernorm.bias', 'transformer.layers.23.attention.rotary_emb.inv_freq', 'transformer.layers.17.attention.dense.weight', 'transformer.layers.17.mlp.dense_4h_to_h.weight', 'transformer.layers.15.attention.dense.bias', 'transformer.layers.24.input_layernorm.weight', 'transformer.layers.12.attention.dense.weight', 'transformer.layers.17.post_attention_layernorm.weight', 'transformer.layers.26.attention.dense.weight', 'transformer.layers.8.attention.dense.bias', 'transformer.layers.23.mlp.dense_4h_to_h.bias', 'transformer.layers.13.mlp.dense_4h_to_h.bias', 'transformer.layers.25.attention.query_key_value.bias', 'transformer.layers.9.attention.query_key_value.weight', 'transformer.layers.11.attention.query_key_value.weight', 'transformer.layers.26.post_attention_layernorm.bias', 'transformer.layers.22.attention.query_key_value.bias', 'transformer.layers.16.attention.query_key_value.weight', 'transformer.layers.25.mlp.dense_h_to_4h.bias', 'transformer.layers.25.mlp.dense_h_to_4h.weight', 'transformer.layers.12.mlp.dense_4h_to_h.weight', 'transformer.layers.9.attention.dense.weight', 'transformer.layers.11.post_attention_layernorm.weight', 'transformer.layers.19.mlp.dense_h_to_4h.bias', 'transformer.layers.20.attention.dense.bias', 'transformer.layers.12.mlp.dense_h_to_4h.weight', 'transformer.layers.21.mlp.dense_4h_to_h.weight', 'transformer.layers.12.attention.query_key_value.weight', 'transformer.layers.21.input_layernorm.weight', 'transformer.layers.22.attention.rotary_emb.inv_freq', 'transformer.layers.8.attention.dense.weight', 'transformer.layers.15.input_layernorm.bias', 'transformer.layers.19.attention.query_key_value.weight', 'transformer.layers.14.mlp.dense_4h_to_h.weight', 'transformer.layers.21.mlp.dense_h_to_4h.weight', 'transformer.layers.20.mlp.dense_4h_to_h.weight', 'transformer.layers.17.post_attention_layernorm.bias', 'transformer.layers.19.mlp.dense_4h_to_h.weight', 'transformer.layers.23.post_attention_layernorm.weight', 'transformer.layers.20.attention.query_key_value.weight', 'transformer.layers.21.attention.rotary_emb.inv_freq', 'transformer.layers.12.attention.query_key_value.bias', 'transformer.layers.19.attention.dense.weight', 'transformer.layers.24.attention.query_key_value.bias', 'transformer.layers.16.post_attention_layernorm.bias', 'transformer.layers.18.attention.query_key_value.weight', 'transformer.layers.24.attention.rotary_emb.inv_freq', 'transformer.layers.12.mlp.dense_h_to_4h.bias', 'transformer.layers.16.attention.query_key_value.bias', 'transformer.layers.20.mlp.dense_4h_to_h.bias', 'transformer.layers.21.attention.query_key_value.bias', 'transformer.layers.23.input_layernorm.weight', 'transformer.layers.19.post_attention_layernorm.weight', 'transformer.layers.14.mlp.dense_h_to_4h.bias', 'transformer.layers.16.mlp.dense_4h_to_h.weight', 'transformer.layers.10.attention.query_key_value.bias', 'transformer.layers.8.mlp.dense_4h_to_h.bias', 'transformer.layers.21.attention.query_key_value.weight', 'transformer.layers.18.mlp.dense_4h_to_h.bias', 'transformer.layers.11.mlp.dense_h_to_4h.weight', 'transformer.layers.13.mlp.dense_4h_to_h.weight', 'transformer.layers.10.mlp.dense_4h_to_h.bias', 'transformer.layers.27.input_layernorm.bias', 'transformer.layers.9.mlp.dense_h_to_4h.bias', 'transformer.layers.14.mlp.dense_h_to_4h.weight', 'transformer.layers.16.mlp.dense_h_to_4h.weight', 'transformer.layers.18.mlp.dense_h_to_4h.bias', 'transformer.layers.16.attention.dense.bias', 'transformer.layers.14.attention.dense.weight', 'transformer.layers.13.input_layernorm.weight', 'transformer.layers.18.attention.dense.bias', 'transformer.layers.13.mlp.dense_h_to_4h.weight', 'transformer.layers.24.mlp.dense_4h_to_h.weight', 'transformer.layers.27.attention.query_key_value.weight', 'transformer.layers.23.attention.dense.bias', 'transformer.layers.8.post_attention_layernorm.bias', 'transformer.layers.24.mlp.dense_h_to_4h.weight', 'transformer.layers.14.input_layernorm.bias', 'transformer.layers.26.attention.rotary_emb.inv_freq', 'transformer.layers.27.attention.rotary_emb.inv_freq', 'transformer.layers.20.post_attention_layernorm.bias', 'transformer.layers.26.mlp.dense_4h_to_h.bias', 'transformer.layers.9.input_layernorm.bias', 'transformer.layers.22.post_attention_layernorm.bias', 'transformer.layers.27.input_layernorm.weight', 'transformer.layers.20.post_attention_layernorm.weight', 'transformer.layers.13.attention.query_key_value.bias', 'transformer.layers.10.mlp.dense_h_to_4h.weight', 'transformer.layers.20.input_layernorm.weight', 'transformer.layers.20.attention.rotary_emb.inv_freq', 'transformer.layers.26.input_layernorm.weight', 'transformer.layers.15.post_attention_layernorm.bias', 'transformer.layers.10.input_layernorm.bias', 'transformer.layers.20.attention.dense.weight', 'transformer.layers.10.mlp.dense_h_to_4h.bias', 'transformer.layers.26.attention.query_key_value.bias', 'transformer.layers.16.mlp.dense_h_to_4h.bias', 'transformer.layers.23.mlp.dense_h_to_4h.weight', 'transformer.layers.17.input_layernorm.weight', 'transformer.layers.10.post_attention_layernorm.bias', 'transformer.layers.8.attention.query_key_value.weight', 'transformer.layers.24.attention.dense.bias', 'transformer.layers.19.attention.dense.bias', 'transformer.layers.19.input_layernorm.weight', 'transformer.layers.9.post_attention_layernorm.weight', 'transformer.layers.25.mlp.dense_4h_to_h.weight', 'transformer.layers.13.attention.dense.weight', 'transformer.layers.18.mlp.dense_h_to_4h.weight', 'transformer.layers.9.attention.dense.bias', 'transformer.layers.8.input_layernorm.bias', 'transformer.layers.11.attention.dense.bias', 'transformer.layers.19.attention.query_key_value.bias', 'transformer.layers.9.mlp.dense_4h_to_h.bias', 'transformer.layers.10.attention.dense.bias', 'transformer.layers.17.mlp.dense_4h_to_h.bias', 'transformer.layers.21.mlp.dense_4h_to_h.bias', 'transformer.layers.25.attention.dense.weight', 'transformer.layers.27.attention.dense.weight', 'transformer.layers.11.input_layernorm.bias', 'transformer.layers.25.post_attention_layernorm.bias', 'transformer.layers.21.mlp.dense_h_to_4h.bias', 'transformer.layers.8.attention.rotary_emb.inv_freq', 'transformer.layers.22.input_layernorm.bias', 'transformer.layers.12.attention.rotary_emb.inv_freq', 'transformer.layers.13.attention.rotary_emb.inv_freq', 'transformer.layers.14.attention.query_key_value.bias', 'transformer.layers.16.attention.rotary_emb.inv_freq', 'transformer.layers.22.mlp.dense_h_to_4h.weight', 'transformer.layers.27.mlp.dense_h_to_4h.bias', 'transformer.layers.25.attention.rotary_emb.inv_freq', 'transformer.layers.17.mlp.dense_h_to_4h.weight', 'transformer.layers.26.mlp.dense_h_to_4h.weight', 'transformer.layers.9.mlp.dense_4h_to_h.weight', 'transformer.layers.19.mlp.dense_4h_to_h.bias', 'transformer.layers.18.attention.rotary_emb.inv_freq', 'transformer.layers.8.mlp.dense_h_to_4h.bias', 'transformer.layers.13.mlp.dense_h_to_4h.bias', 'transformer.layers.11.post_attention_layernorm.bias', 'transformer.layers.22.post_attention_layernorm.weight', 'transformer.layers.13.input_layernorm.bias', 'transformer.layers.13.attention.dense.bias', 'transformer.layers.27.post_attention_layernorm.bias', 'transformer.layers.14.attention.query_key_value.weight', 'transformer.layers.14.mlp.dense_4h_to_h.bias', 'transformer.layers.27.mlp.dense_4h_to_h.weight', 'transformer.layers.24.attention.query_key_value.weight', 'transformer.layers.22.attention.dense.weight', 'transformer.layers.22.mlp.dense_4h_to_h.weight', 'transformer.layers.17.input_layernorm.bias', 'transformer.layers.9.post_attention_layernorm.bias', 'transformer.layers.16.post_attention_layernorm.weight', 'transformer.layers.14.attention.dense.bias', 'transformer.layers.14.input_layernorm.weight', 'transformer.layers.11.input_layernorm.weight', 'transformer.layers.8.mlp.dense_4h_to_h.weight', 'transformer.layers.18.input_layernorm.bias', 'transformer.layers.25.post_attention_layernorm.weight', 'transformer.layers.14.post_attention_layernorm.bias', 'transformer.layers.16.attention.dense.weight', 'transformer.layers.8.post_attention_layernorm.weight', 'transformer.layers.25.input_layernorm.weight', 'transformer.layers.21.post_attention_layernorm.bias', 'transformer.layers.15.post_attention_layernorm.weight', 'transformer.layers.18.attention.dense.weight', 'transformer.layers.23.attention.dense.weight', 'transformer.layers.10.mlp.dense_4h_to_h.weight', 'transformer.layers.18.post_attention_layernorm.weight', 'transformer.layers.15.input_layernorm.weight', 'transformer.layers.27.mlp.dense_4h_to_h.bias', 'transformer.layers.14.attention.rotary_emb.inv_freq', 'transformer.layers.13.attention.query_key_value.weight', 'transformer.layers.24.input_layernorm.bias', 'transformer.layers.14.post_attention_layernorm.weight', 'transformer.layers.21.input_layernorm.bias', 'transformer.layers.11.mlp.dense_4h_to_h.weight', 'transformer.layers.18.attention.query_key_value.bias', 'transformer.layers.23.attention.query_key_value.weight', 'transformer.layers.26.input_layernorm.bias', 'transformer.layers.21.post_attention_layernorm.weight', 'transformer.layers.24.mlp.dense_h_to_4h.bias', 'transformer.layers.26.mlp.dense_h_to_4h.bias', 'transformer.layers.11.mlp.dense_h_to_4h.bias', 'transformer.layers.10.post_attention_layernorm.weight', 'transformer.layers.19.mlp.dense_h_to_4h.weight', 'transformer.layers.25.attention.query_key_value.weight', 'transformer.layers.15.attention.dense.weight', 'transformer.layers.12.post_attention_layernorm.bias', 'transformer.layers.26.mlp.dense_4h_to_h.weight', 'transformer.layers.21.attention.dense.bias', 'transformer.layers.15.mlp.dense_4h_to_h.bias', 'transformer.layers.13.post_attention_layernorm.weight', 'transformer.layers.9.attention.rotary_emb.inv_freq', 'transformer.layers.10.attention.rotary_emb.inv_freq', 'transformer.layers.15.mlp.dense_h_to_4h.weight', 'transformer.layers.27.attention.dense.bias', 'transformer.layers.23.mlp.dense_4h_to_h.weight', 'transformer.layers.15.attention.rotary_emb.inv_freq', 'transformer.layers.16.input_layernorm.weight', 'transformer.layers.10.attention.dense.weight', 'transformer.layers.26.post_attention_layernorm.weight', 'transformer.layers.11.mlp.dense_4h_to_h.bias', 'transformer.layers.15.attention.query_key_value.weight', 'transformer.layers.17.attention.query_key_value.weight', 'transformer.layers.25.input_layernorm.bias', 'transformer.layers.10.attention.query_key_value.weight', 'transformer.layers.26.attention.query_key_value.weight', 'transformer.layers.8.attention.query_key_value.bias', 'transformer.layers.20.mlp.dense_h_to_4h.bias', 'transformer.layers.19.input_layernorm.bias', 'transformer.layers.23.attention.query_key_value.bias', 'transformer.layers.23.mlp.dense_h_to_4h.bias', 'transformer.layers.18.mlp.dense_4h_to_h.weight', 'transformer.layers.11.attention.query_key_value.bias', 'transformer.layers.12.input_layernorm.weight', 'transformer.layers.23.post_attention_layernorm.bias', 'transformer.layers.25.attention.dense.bias', 'transformer.layers.19.post_attention_layernorm.bias', 'transformer.layers.19.attention.rotary_emb.inv_freq', 'transformer.layers.23.input_layernorm.bias', 'transformer.layers.13.post_attention_layernorm.bias', 'transformer.layers.20.attention.query_key_value.bias', 'transformer.layers.26.attention.dense.bias', 'transformer.layers.8.input_layernorm.weight', 'transformer.layers.22.input_layernorm.weight', 'transformer.layers.9.input_layernorm.weight', 'transformer.layers.18.post_attention_layernorm.bias', 'transformer.layers.10.input_layernorm.weight', 'transformer.layers.12.post_attention_layernorm.weight', 'transformer.layers.17.attention.query_key_value.bias', 'transformer.layers.17.attention.dense.bias', 'transformer.layers.20.mlp.dense_h_to_4h.weight', 'transformer.layers.12.attention.dense.bias', 'transformer.layers.16.input_layernorm.bias', 'transformer.layers.9.attention.query_key_value.bias', 'transformer.layers.17.attention.rotary_emb.inv_freq', 'transformer.layers.15.mlp.dense_4h_to_h.weight', 'transformer.layers.18.input_layernorm.weight', 'transformer.layers.24.post_attention_layernorm.bias', 'transformer.layers.27.mlp.dense_h_to_4h.weight', 'transformer.layers.24.mlp.dense_4h_to_h.bias', 'transformer.layers.11.attention.rotary_emb.inv_freq', 'transformer.layers.22.mlp.dense_4h_to_h.bias', 'transformer.layers.21.attention.dense.weight', 'transformer.layers.11.attention.dense.weight', 'transformer.layers.22.mlp.dense_h_to_4h.bias', 'transformer.layers.27.attention.query_key_value.bias', 'transformer.layers.25.mlp.dense_4h_to_h.bias', 'transformer.layers.24.post_attention_layernorm.weight', 'transformer.layers.17.mlp.dense_h_to_4h.bias', 'transformer.layers.27.post_attention_layernorm.weight', 'transformer.layers.16.mlp.dense_4h_to_h.bias', 'transformer.layers.12.mlp.dense_4h_to_h.bias', 'transformer.layers.9.mlp.dense_h_to_4h.weight', 'transformer.layers.22.attention.dense.bias', 'transformer.layers.20.input_layernorm.bias', 'transformer.layers.8.mlp.dense_h_to_4h.weight', 'transformer.layers.22.attention.query_key_value.weight']

Process finished with exit code 1 `

ssbuild commented 1 year ago

你这个错误是在打印训练model summary 报了错误! 在linux 测的是正常的,尽量使用linux来训练吧。

ngbruce commented 1 year ago

好的,以后再试试