load the model erro - Githubissues

hengran commented 3 months ago

Hi, i meet a problem when load the model using the "tevatron/retriever/driver/encode.py". "lora_name_or_path" is trained with "tevatron/retriever/driver/train.py". I am confused by this problem. Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/paddlejob/workspace/env_run/llm-index/src/tevatron/retriever/driver/encode.py", line 119, in main() File "/root/paddlejob/workspace/env_run/llm-index/src/tevatron/retriever/driver/encode.py", line 63, in main model = DenseModel.load( File "/root/paddlejob/workspace/env_run/llm-index/src/tevatron/retriever/modeling/encoder.py", line 175, in load lora_model = PeftModel.from_pretrained(base_model, lora_name_or_path, config=lora_config, use_safetensors=False) File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 356, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 730, in load_adapter load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name) File "/usr/local/lib/python3.8/dist-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict load_result = model.load_state_dict(peft_model_state_dict, strict=False) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 2189, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PeftModelForFeatureExtraction: size mismatch for base_model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([11008, 8]). size mismatch for base_model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([11008, 8]). size mismatch for base_model.model.layers.0.mlp.downproj.lora

MXueguang commented 3 months ago

Hi @hengran, this is a relevant one https://github.com/texttron/tevatron/issues/118. I guess you can delete the safetensor ckpt and use the adaptor_model.bin in the saved lora ckpt (if it is not trained/saved in latest version)

Mr-Lnan commented 2 months ago

I have the same problem, have you solved it?

texttron / tevatron

load the model erro #141