koukoulala commented 3 years ago

📚 Migration

Information

Model I am using (Bert, XLNet ...): google/mt5-base

Language I am using the model on (English, Chinese ...): multi-language

The problem arises when using:

[ ] the official example scripts: (give details below)
my own modified scripts: (give details below)

Just a little change in ./examples/pytorch/summarization/run_summarization_no_trainer.py to suit for NTG task and bleu evaluation metric.

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name): XGLUE-NTG
[ ] my own task or dataset: (give details below)

Details

When training MT5 with multilingual data, do I need to add the "--source_prefix" argument like T5? If so, " --source_prefix=' Summarize: ' " Is that right? But when this was added, the results were poor in all language but English. Is there a problem with my parameter setting?

Also, the result with the parameter "--source_prefix" above is actually the same as the result without the parameter below:

Should we set different --source_prefix for different languages, and how to set that?

Environment info

transformers version:
Platform:
Python version: 3.6
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

pytorch-transformers or pytorch-pretrained-bert version (or branch):

Checklist

I have read the migration guide in the readme. (pytorch-transformers; pytorch-pretrained-bert)
I checked if a related official extension example runs on my machine.

NielsRogge commented 3 years ago

From the T5 author (I asked him):

since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?

koukoulala commented 3 years ago

From the T5 author (I asked him):

since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Hence, no prefix should be used. However, the performance you get without prefix is similar, you say?

Thank you very much for your reply. Does MT5 have any finetuning scripts on multilingual title generation task? Why is it so bad in other languages? Does MT5 have any special hyperparameters that need to be set?
here is my command: python -u -m torch.distributed.launch --nproc_per_node 4 --use_env examples/pytorch/summarization/run_xglue_no_trainer.py --model_name_or_path=google/mt5-base --dataset_name=ntg - --per_device_train_batch_size=2 --per_device_eval_batch_size=4"

Thansks!

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

nikhiljaiswal commented 2 years ago

@koukoulala @NielsRogge I had also similar doubt, instead of MT5, I want to finetune M2M100 on more than one language pair. Any leads on how to achieve that? I am able to finetune on single language pair, but how to finetune on more than one pair simultaneously?

huggingface / transformers

how to finetune mT5 on XGLUE-NTG task #13249

📚 Migration

Information

Details

Environment info

Checklist