huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.28k stars 26.35k forks source link

Adding m2m100 12B #12775

Closed Mehrad0711 closed 2 years ago

Mehrad0711 commented 3 years ago

🌟 New model addition

Hi!

I was wondering if there's been any work on adding the 12B version of m2m100 model to huggingface. Given libraries such as fairscale or parallelformers, inference with these relatively big models should be possible now. Are there any model changes needed to accommodate the 12B version? And does the current m2m100 conversion script work for it?

Open source status

Tagging @patil-suraj who added m2m100.

Fikavec commented 3 years ago

And also please add:

  1. flores101_mm100_615M | 12 | 1024 | 4096 | 256,000 | 615M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_615M.tar.gz

  2. flores101_mm100_175M | 6 | 512 | 2048 | 256,000 | 175M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_175M.tar.gz

from http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html

These models are trained similar to M2M-100 with additional support for the languages that are part of the WMT Large-Scale Multilingual Machine Translation track. (paper https://arxiv.org/pdf/2106.03193.pdf)

patil-suraj commented 3 years ago

@Mehrad0711

I was wondering if there's been any work on adding the 12B version of m2m100 model to huggingface.

Supporting such large model inference in Transformers is under work now, so it should be possible to add the 12B model in near future, I will post an update once it's ready :)

And does the current m2m100 conversion script work for it?

The current conversion script might not work for the 12B model, as the module structure seems a bit different.

@Fikavec

Thank you for sharing these links, I will add those models.

Mehrad0711 commented 2 years ago

Hi @patil-suraj, Just wanted to check if there's any news/ progress for adding the new models. Thanks.

patil-suraj commented 2 years ago

Hey @Mehrad0711 !

The 12B checkpoints are now available on the hub: https://huggingface.co/models?other=m2m100-12B

Mehrad0711 commented 2 years ago

Hi @patil-suraj, Thanks a lot for your work!

Fikavec commented 2 years ago

Hi @patil-suraj ! Thank you for a bigger models ! Please tell me why "max_length" and "num_beams" parameters not presented in config.json? Without "max_length" parameter models to much truncates translation results by default. What is the maximum best value of "max_length" and best value of "num_beams" for this models? In older config i'm saw: "max_length": 200, "num_beams": 5. In the paper i'm found 'The length filtering removes sentences that are too long—more than 250 subwords after segmentation with SPM—or with a length mismatch between the sentence and its translation—if the length ratio is greater than 3×. ' and 'We use a beam search with beam of size 5'. Maybe add max_length and num_beams into example of using models in huggingface pages?