Closed Mehrad0711 closed 2 years ago
And also please add:
flores101_mm100_615M | 12 | 1024 | 4096 | 256,000 | 615M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_615M.tar.gz
flores101_mm100_175M | 6 | 512 | 2048 | 256,000 | 175M | https://dl.fbaipublicfiles.com/flores101/pretrained_models/flores101_mm100_175M.tar.gz
from http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
These models are trained similar to M2M-100 with additional support for the languages that are part of the WMT Large-Scale Multilingual Machine Translation track. (paper https://arxiv.org/pdf/2106.03193.pdf)
@Mehrad0711
I was wondering if there's been any work on adding the 12B version of m2m100 model to huggingface.
Supporting such large model inference in Transformers is under work now, so it should be possible to add the 12B model in near future, I will post an update once it's ready :)
And does the current m2m100 conversion script work for it?
The current conversion script might not work for the 12B model, as the module structure seems a bit different.
@Fikavec
Thank you for sharing these links, I will add those models.
Hi @patil-suraj, Just wanted to check if there's any news/ progress for adding the new models. Thanks.
Hey @Mehrad0711 !
The 12B checkpoints are now available on the hub: https://huggingface.co/models?other=m2m100-12B
Hi @patil-suraj, Thanks a lot for your work!
Hi @patil-suraj ! Thank you for a bigger models ! Please tell me why "max_length" and "num_beams" parameters not presented in config.json? Without "max_length" parameter models to much truncates translation results by default. What is the maximum best value of "max_length" and best value of "num_beams" for this models? In older config i'm saw: "max_length": 200, "num_beams": 5. In the paper i'm found 'The length filtering removes sentences that are too long—more than 250 subwords after segmentation with SPM—or with a length mismatch between the sentence and its translation—if the length ratio is greater than 3×. ' and 'We use a beam search with beam of size 5'. Maybe add max_length and num_beams into example of using models in huggingface pages?
🌟 New model addition
Hi!
I was wondering if there's been any work on adding the 12B version of m2m100 model to huggingface. Given libraries such as fairscale or parallelformers, inference with these relatively big models should be possible now. Are there any model changes needed to accommodate the 12B version? And does the current m2m100 conversion script work for it?
Open source status
Tagging @patil-suraj who added m2m100.