从checkpoint继续lora微调报错

xiang-hui744 commented 4 months ago

使用命令： CUDA_VISIBLE_DEVICES=0 accelerate launch finetune.py --data_path ./data/introduction2_train.json --base_model /home/keke/KnowLM/knowlm-13b-zhixi --resume_from_checkpoint /home/keke/KnowLM/finetune/lora/knowlm/checkpoint/in22/checkpoint-130 报错： Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:25<00:00, 8.47s/it] /home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/peft/utils/other.py:122: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Restarting from /home/keke/KnowLM/finetune/lora/knowlm/checkpoint/in22/checkpoint-130/pytorch_model.bin Traceback (most recent call last): File "/home/keke/KnowLM/finetune/lora/knowlm/finetune.py", line 301, in fire.Fire(train) File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/keke/KnowLM/finetune/lora/knowlm/finetune.py", line 217, in train model.print_trainable_parameters() # Be more transparent about the % of trainable params. AttributeError: '_IncompatibleKeys' object has no attribute 'print_trainable_parameters' Traceback (most recent call last): File "/home/keke/anaconda3/envs/lora/bin/accelerate", line 8, in sys.exit(main()) File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1097, in launch_command simple_launcher(args) File "/home/keke/anaconda3/envs/lora/lib/python3.9/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/keke/anaconda3/envs/lora/bin/python', 'finetune.py', '--data_path', './data/introduction2_train.json', '--base_model', '/home/keke/KnowLM/knowlm-13b-zhixi', '--resume_from_checkpoint', '/home/keke/KnowLM/finetune/lora/knowlm/checkpoint/in22/checkpoint-130']' returned non-zero exit status 1.

MikeDean2367 commented 4 months ago

您好，请尝试注释这一行代码

xiang-hui744 commented 4 months ago

您好，请尝试注释[这一行代码] 我发现是model在load_state_dict以后变成_IncompatibleKeys(missing_keys=['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj.weight', 'base_model.model.model.layers.0.self_attn.k_proj.weight', 'base_model.model.model.layers.0.self_attn.v_proj.weight', 'base_model.model.model.layers.0.self_attn.o_proj.weight', ...]) 导致加载参数一直报错，不知道如何解决，请指教！

MikeDean2367 commented 4 months ago

您好，参考这个issue，建议使用最新版本的peft，如仍有问题，请告知我，并提供相应的包的版本号 :)

zxlzr commented 4 months ago

请问您的问题解决了吗

xiang-hui744 commented 4 months ago

您好，参考这个issue，建议使用最新版本的peft，如仍有问题，请告知我，并提供相应的包的版本号 :)

请问您的问题解决了吗

使用peft版本0.2.0 可解决问题

zjunlp / KnowLM

从checkpoint继续lora微调报错 #140