which pre-train model should we use for fine-tuning

prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit

MIT License

173 stars 32 forks source link

which pre-train model should we use for fine-tuning #36

Open Aniruddha-JU opened 2 years ago

Aniruddha-JU commented 2 years ago

I have pre-trained the IndicBART model on new monolingual data, and in the model path two models are saved 1) IndicBART and 2) IndicBART_puremodel. Now which should we use during the fine-tuning?

Aniruddha-JU commented 2 years ago

IndicBART size is 2.4 GB and pure_model size is 932.

prajdabre commented 2 years ago

Either.

Use the pure model with the flag --pretrained_model

Use the larger model with the flag --pretrained_model and an additional flag --no_reload_optimizer_ctr_and_scheduler

The larger checkpoint contains optimizer and scheduler states so you can resume pretraining in case of crash. During fine tuning resetting the optimizer is more common.