ictnlp / LFR-NMT

Source code for the EMNLP 2022 paper "Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions"
MIT License
5 stars 0 forks source link

Reproduce LFR-CM #1

Open VietHoang1512 opened 1 year ago

VietHoang1512 commented 1 year ago

Hi @gushu333, congratulation on your great work.

I am trying to reproduce the results of LFR-CM using your provided command on README file. However, fairseq returns RuntimeError. Even though I tried to pass --arch bart_large and some additional arguments, there are still mismatches between the built model and the pretrained one:

        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([250054, 1024]) from checkpoint, the shape in current model is torch.Size([250058, 1024]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([250054, 1024]) from checkpoint, the shape in current model is torch.Size([250058, 1024]).
        size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([250054, 1024]) from checkpoint, the shape in current model is torch.Size([250058, 1024]).

So, could you please provide me with your exact command to reproduce the result of LFR-CM? Thank you in advance!

gushu333 commented 1 year ago

Hi, very glad to help you.

This bug is because we insert new language tokens to the dictionary file in the databin folder, so it extends the size of the embedding layer of the pretrained mBart50-nn model. We did this for the language adaptation task, but we forgot to delete the newly added language tokens in the domain adaptation task.

So you can just delete the last four lines of the dictionary files in the databin folder and reuse the model released by meta AI.

You can also download our pretrained model here, which just extends the embedding layer based on the original model, so you don't need to modify the dictionary files.

VietHoang1512 commented 1 year ago

Thank you so much for your instant reply, however I counted the number of different words in dictionary files and they still have 250054 tokens

image

so could you please give me the exact file paths that need to be modified, please.

gushu333 commented 1 year ago

To compute the fisher information matrix, please modify all the dict* files in "data_bin/flores_mbart50spm_en/"

To use the LFR_CM method, please modify all the dict files in "data_bin/ende_5domain/"

Remove the last four lines and 250050 tokens are right for the mBart50 model.

We have also updated our README. You can refer to it for new instructions. :)

VietHoang1512 commented 1 year ago

Thank you so much for your help, I can get the Fisher information matrix now.

However, I still got the error when trying to train the model with the LFR-CM approach

lfr/transformer_adapter.py", line 237, in register_par_mask
    tmp_p = self.fisher_matrix[n].data.detach().clone().abs().view(-1).to(p.device)
KeyError: 'encoder.layers.0.in_proj_weight'
gushu333 commented 1 year ago

Hi, this error seems because the key 'encoder.layers.0.in_proj_weight' in the model does not appear in the FIM.

It seems that this kind of name like 'encoder.layers.0.in_proj_weight' only appears in the old fairseq version, and it should not appear in the current fairseq model: https://github.com/ictnlp/LFR-NMT/blob/cdf71c2aade34ecd7631ee00dbcdd342e1ddcaa6/fairseq/modules/multihead_attention.py#L471-L481

So are you using the code downloaded from this repository or an older version?

VietHoang1512 commented 1 year ago

Thank you so much for your help in addressing my issues, I am currently trying to experiment with another model architecture (e.g. --arch transformer_wmt_en_de with fairseq). So could you please tell me which file I should modify in order to adapt with my use case?

Thank you in advance.