huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.19k stars 27.06k forks source link

how to finetune mT5 on XGLUE-NTG task #13249

Closed koukoulala closed 3 years ago

koukoulala commented 3 years ago

📚 Migration

Information

Model I am using (Bert, XLNet ...): google/mt5-base

Language I am using the model on (English, Chinese ...): multi-language

The problem arises when using:

Just a little change in ./examples/pytorch/summarization/run_summarization_no_trainer.py to suit for NTG task and bleu evaluation metric.

The tasks I am working on is:

Details

When training MT5 with multilingual data, do I need to add the "--source_prefix" argument like T5? If so, " --source_prefix=' Summarize: ' " Is that right? But when this was added, the results were poor in all language but English. Is there a problem with my parameter setting?

image

Also, the result with the parameter "--source_prefix" above is actually the same as the result without the parameter below: image

Should we set different --source_prefix for different languages, and how to set that?

Environment info

Checklist

NielsRogge commented 3 years ago

From the T5 author (I asked him):

since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?

koukoulala commented 3 years ago

From the T5 author (I asked him):

since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?

Thank you very much for your reply. Does MT5 have any finetuning scripts on multilingual title generation task? Why is it so bad in other languages? Does MT5 have any special hyperparameters that need to be set?
here is my command: python -u -m torch.distributed.launch --nproc_per_node 4 --use_env examples/pytorch/summarization/run_xglue_no_trainer.py --model_name_or_path=google/mt5-base --dataset_name=ntg - --per_device_train_batch_size=2 --per_device_eval_batch_size=4"

Thansks!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

nikhiljaiswal commented 2 years ago

@koukoulala @NielsRogge I had also similar doubt, instead of MT5, I want to finetune M2M100 on more than one language pair. Any leads on how to achieve that? I am able to finetune on single language pair, but how to finetune on more than one pair simultaneously?