Issue in 2nd Stage of Pre-training

jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Other

506 stars 25 forks source link

Issue in 2nd Stage of Pre-training #7

Closed ShramanPramanick closed 1 year ago

ShramanPramanick commented 1 year ago

I faced the following error when I launched the 2nd stage of pre-training. ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group This error is likely due to number of trainable parameters are different in the 2nd stage than the 1st stage. How did you resolved this?

ShramanPramanick commented 1 year ago

One follow-up (slightly unrelated) question. Have you tried to fine-tune the mm_projector layers in the 1st stage too? In the current version, the 1st stage only unfreezes the spi_module parameters. Do you have any comments on the effect of unfreezing projection parameters in the 1st stage?

jshilong commented 1 year ago

I faced the following error when I launched the 2nd stage of pre-training. ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group This error is likely due to number of trainable parameters are different in the 2nd stage than the 1st stage. How did you resolved this?

Apologies for the delayed response. I have fixed the training script on fix load stage1 The error occurred because the autoresume function attempted to resume the optimizer from stage1

jshilong commented 1 year ago

One follow-up (slightly unrelated) question. Have you tried to fine-tune the mm_projector layers in the 1st stage too? In the current version, the 1st stage only unfreezes the spi_module parameters. Do you have any comments on the effect of unfreezing projection parameters in the 1st stage?

The main reason I did not train with mm_projector is because it was pre-trained on a large amount of caption data in LLaVA. Using only a relatively small amount of region data in stage1 may actually decrease its expressive ability. However, if I were to apply object detection on the caption data in the same way as what I did for Llava150k, and add caption data to the first stage1, I believe it would improve the results.

jshilong commented 1 year ago

I faced the following error when I launched the 2nd stage of pre-training. ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group This error is likely due to number of trainable parameters are different in the 2nd stage than the 1st stage. How did you resolved this?

I have published the checkpoint and polished the README today, so you can now pull them and use this repository more smoothly.

ShramanPramanick commented 1 year ago

Thanks @jshilong for open-sourcing your model. Thanks for your help!