size mismatch - Githubissues

mt-upc / iwslt-2021

Systems submitted to IWSLT 2021 by the MT-UPC group.

MIT License

14 stars 4 forks source link

size mismatch #6

Closed qinglin666635 closed 2 months ago

qinglin666635 commented 2 months ago

error： RuntimeError: Error(s) in loading state_dict for TransformerDecoderMod: size mismatch for embed_tokens.weight: copying a param with shape torch.Size([250054, 1024]) from checkpoint, the shape in current model is torch.Size([5052, 1024]). size mismatch for output_projection.weight: copying a param with shape torch.Size([250054, 1024]) from checkpoint, the shape in current model is torch.Size([5052, 1024])

could someone tell me how to deal with this problem？how to change the code to match my dataset and dictionary size？thank you very much

gegallego commented 2 months ago

Hello,

First, take into account that this is a project from some years ago and I might not remember well all the details.

This issue looks like a mismatch between the vocabulary size you are using and the one from the pretrained decoder. I assume you are setting a vocabulary size of 5k (which affects embed_tokens and output_projection) and loading an mBART decoder with a 250k vocabulary size.

You can’t modify the vocabulary if loading a pretrained mBART decoder.

Best regards, Gerard

qinglin666635 commented 2 months ago

Hello，

did you mean that the size of the vocabulary is determined by the pre-trained mBART model I loaded, and this cannot be modified.

best wishes to you

gegallego commented 2 months ago

Exactly. The vocabulary affects the embedding table at the input of the decoder and the output projection at its output. You cannot change the vocabulary unless you do some extra tricks. Why can’t you reuse the vocabulary from mBART?

qinglin666635 commented 2 months ago

Hello，

Because my dataset is very small and the language is a low-resource language, complex models and larger vocabularies are not conducive to my translation results. Could you specify what adjustments should be made to accommodate a smaller vocabulary?

best wishes

gegallego commented 2 months ago

Is your target language among the ones supported by mBART? If it is, you should use the mBART dictionary. If it’s not, then I’m not sure if you’ll be able to make it work…

The idea that comes to my mind is modifying the code to avoid loading the embedding and output projection from mBART. So they’d be randomly initialized, while the rest of the weights of the model would still be pretrained. I’d maybe try to keep all the decoder frozen during training except the embedding and output projection. That way you’d be adapting the mBART decoder to your language.

Please, take this recommendation with a grain of salt, I’ve never tried it and I’m not sure if this would work.

qinglin666635 commented 2 months ago

Yes, my target language is one of the languages supported by mBART. Therefore, I will continue using mBART dictionary and sentencepiece.bpe.model.

Thank you for your help. Best wishes.