jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Other
496 stars 25 forks source link

Load weight error #16

Open zwhus opened 1 year ago

zwhus commented 1 year ago

Hi, Thanks for your excellent work. Now I ran into an issue when I tried to load GPT4ROI weights to perform stage2 training and there was an error ”Error(s) in loading state_dict for SPILlavaMPTForCausalLM: size mismatch for lm_head.weight: copying a param with shape torch.Size([32006, 4096]) from checkpoint, the shape in current model is torch.Size([32005, 4096]).“ How to solve this problem? Looking forward to your reply!

jshilong commented 1 year ago

Hi, to better understand your situation, I need more information about how you're loading the model, the script you're using, and whether you're loading the weights from the first stage or the final weight we providing.

zwhus commented 1 year ago

Thank you for your reply! First, I followed the download weights tutorial to get weight GPT4RoI-7B Next, I want to continue training GPT4RoI with this weight, so I reference stage2 and use the command:

bash train_stage2.sh exp/stage2 GPT4ROi-7B

Here, GPT4RoI-7B is the final weight, and the stage2 file is unchanged Finally: there was an error ”Error(s) in loading state_dict for SPILlavaMPTForCausalLM: size mismatch for lm_head.weight: copying a param with shape torch.Size([32006, 4096]) from checkpoint, the shape in current model is torch.Size([32005, 4096]).“ How to solve this problem?

jshilong commented 1 year ago

please change pip install tokenizers==0.13.3 and pip install transformers@git+https://github.com/huggingface/transformers.git@cae78c46.

I tried the same operation and there is not error.

jshilong commented 1 year ago

Please change these two package versions pip install tokenizers==0.13.3 and pip install transformers@git+https://github.com/huggingface/transformers.git@cae78c46.

I tried the same operation and there is no error.

jshilong commented 1 year ago

Perhaps you could furnish me with the comprehensive error message. I'm interested in determining whether this error transpires during the initialization of the model or while trying to resume it from GPT4ROi-7B

jshilong commented 1 year ago

This may be an issue due to improper weight merging. For troubleshooting, you can try resuming from https://huggingface.co/shilongz/debug to make sure your weight is no problem