Mismatch when loading the checkpoints

JiyangZhang commented 3 years ago

Hi, thanks for your great work!

When I tried to load the pre-trained checkpoints and fine tune, I came across the size mismatch problem. It seems that the dict.txt you provided does not match the checkpoints.

Here is the error message:

size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([50005, 768]) from checkpoint, the shape in current model is torch.Size([50001, 768]). size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([50005, 768]) from checkpoint, the shape in current model is torch.Size([50001, 768]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([50005, 768]) from checkpoint, the shape in current model is torch.Size([50001, 768]).

This is the script I used to get the checkpoints: https://github.com/wasiahmad/PLBART/blob/main/pretrain/download.sh

This is the dict.txt I used: https://github.com/wasiahmad/PLBART/blob/main/sentencepiece/dict.txt

Here is the command I used to fine tune: fairseq-train $PATH_2_DATA \ --user-dir $USER_DIR --truncate-source \ --arch mbart_base --layernorm-embedding \ --task translation \ --source-lang $SOURCE --target-lang $TARGET \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --batch-size $BATCH_SIZE --update-freq $UPDATE_FREQ --max-epoch 30 \ --optimizer adam --adam-eps 1e-06 --adam-betas '(0.9, 0.98)' \ --lr-scheduler polynomial_decay --lr 5e-05 --min-lr -1 \ --warmup-updates 500 --max-update 100000 \ --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.0 \ --seed 1234 --log-format json --log-interval 100 \ ${restore} \ --eval-bleu --eval-bleu-detok space --eval-tokenized-bleu \ --eval-bleu-remove-bpe sentencepiece --eval-bleu-args '{"beam": 5}' \ --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \ --no-epoch-checkpoints --patience 5 \ --ddp-backend no_c10d --save-dir $SAVE_DIR 2>&1 | tee ${OUTPUT_FILE};

wasiahmad commented 3 years ago

The checkpoint is perfectly fine as its embedding weight size (torch.Size([50005, 768])) is correct. The issue is you are missing --langs flag which adds 3 language tokens and 1 mask token is also added. So, then embedding size becomes 50001 + 4 = 50005.

As I see, you are using -task translation which is the main reason for not working. Please, read our scripts carefully. You can only use the following tasks with PLBART.

translation_without_lang_token
translation_from_pretrained_bart
plbart_sentence_prediction

JiyangZhang commented 3 years ago

Hi,

Thanks for the quick response! I used the same version of fairseq and packages listed in the requirements.txt. However, I had this error when I tried translation_without_lang_token:

fairseq-train: error: argument --task: invalid choice: 'translation_without_lang_token' (choose from 'translation', 'multilingual_translation', 'semisupervised_translation', 'language_modeling', 'audio_pretraining', 'translation_multi_simple_epoch', 'multilingual_masked_lm', 'legacy_masked_lm', 'translation_from_pretrained_xlm', 'cross_lingual_lm', 'sentence_ranking', 'masked_lm', 'translation_from_pretrained_bart', 'denoising', 'multilingual_denoising', 'translation_lev', 'sentence_prediction', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt')

wasiahmad commented 3 years ago

translation_without_lang_token is our defined task. Setting --user-dir $USER_DIR should not raise the above-mentioned error. I am not sure why you are facing this error.

JiyangZhang commented 3 years ago

Thanks! That makes things clear!

Could I ask another question? If I want to use PLBART to do a translation task but the generated text should include some special self-defined tokens. What I should do are:

Use the script https://github.com/wasiahmad/PLBART/blob/main/multilingual/plbart/convert.py to add some randomly initialized embeddings to the model.
Define a new task to add some extra special tokens to the dict as shown here: https://github.com/wasiahmad/PLBART/blob/main/source/translation.py

Is it correct? please correct me if there is something wrong.

Thank you again!

wasiahmad commented 3 years ago

Yes, you can include self-defined tokens in the pre-trained checkpoint.
Yes, you need to add those tokens into the dictionary by defining a task (because those special tokens are not part of the original vocabulary).

If you are done, please close the issue.

JiyangZhang commented 3 years ago

Thanks!

wasiahmad / PLBART

Mismatch when loading the checkpoints #25