Closed koukoulala closed 3 years ago
From the T5 author (I asked him):
since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?
From the T5 author (I asked him):
since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?
Thank you very much for your reply. Does MT5 have any finetuning scripts on multilingual title generation task? Why is it so bad in other languages? Does MT5 have any special hyperparameters that need to be set?
here is my command: python -u -m torch.distributed.launch --nproc_per_node 4 --use_env examples/pytorch/summarization/run_xglue_no_trainer.py --model_name_or_path=google/mt5-base --dataset_name=ntg - --per_device_train_batch_size=2 --per_device_eval_batch_size=4"
Thansks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@koukoulala @NielsRogge I had also similar doubt, instead of MT5, I want to finetune M2M100 on more than one language pair. Any leads on how to achieve that? I am able to finetune on single language pair, but how to finetune on more than one pair simultaneously?
📚 Migration
Information
Model I am using (Bert, XLNet ...): google/mt5-base
Language I am using the model on (English, Chinese ...): multi-language
The problem arises when using:
Just a little change in ./examples/pytorch/summarization/run_summarization_no_trainer.py to suit for NTG task and bleu evaluation metric.
The tasks I am working on is:
Details
When training MT5 with multilingual data, do I need to add the "--source_prefix" argument like T5? If so, " --source_prefix=' Summarize: ' " Is that right? But when this was added, the results were poor in all language but English. Is there a problem with my parameter setting?
Also, the result with the parameter "--source_prefix" above is actually the same as the result without the parameter below:
Should we set different --source_prefix for different languages, and how to set that?
Environment info
transformers
version:pytorch-transformers
orpytorch-pretrained-bert
version (or branch):Checklist