tmramalho / finetune-mbart

How to finetune mbart using fairseq
21 stars 6 forks source link

Problems running load_checkpoint.py #1

Open hflserdaniel opened 3 years ago

hflserdaniel commented 3 years ago

Thanks for your concise and helpful tutorial. To test the released mBART first (mbart.cc25.v2.tar.gz), I directly ran the load_checkpoint.py and encountered two problems.

First, BARTModel.from_pretrained seems to have trouble with sentencepiece_vocab: omegaconf.errors.MissingMandatoryValue: Missing mandatory value: sentencepiece_model full_key: sentencepiece_model reference_type=Optional[SentencepieceConfig] object_type=SentencepieceConfig

Second, when the pretrained mBART is loaded without bpe and sentencepiece_vocab (I'm not sure what bpe schema is used in this case), something went wrong in bart.sample: IndexError: index out of range in self

I'm new to Fairseq so excuse if these problems are silly :) Thanks

yellowwoodstock commented 3 years ago

I just start my journey in ML and interested in Machine Translation. Thanks Tiago for his wonder page about "Fine-tune neural translation models with mBART", and manage to follow through it and able do most of the steps.

I also stuck at this load_checkpoint.py too. However I may be able answer @hflserdaniel your first question. The new version of fairseq seems have update the name of the variable. so instead of sentencepiece_vocab you use sentencepiece_model instead. And you should be able to loading it without issue.

        bpe='sentencepiece',
        sentencepiece_model=f'{checkpoint_folder}/mbart.cc25/sentence.bpe.model')

@tmramalho However I do having issue after it, the translation is not working it just output as ['[en_XX]'] (currently the it training on ja-->en)

My questions are:

If I have follow the article and train both way of en<-->ja. When using the load_checkpoint.py to check the translation, how I define the translation direction?

From the mBart doco (https://huggingface.co/transformers/master/model_doc/mbart.html). It mention about src_lang_code and tgt_lang_code. But not sure how to define it in BARTModel to make it work?

Thanks.

anudeepch6789 commented 2 years ago

@yellowwoodstock did you find solution to the translation issue. For me the output is coming same as sentence given as input. @tmramalho