Closed Bachstelze closed 1 year ago
Hi @Bachstelze, both of them are already supported. You can use them with the following command. I will update the README.md to clarify it.
python3 translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.small100.txt \
--source_lang en \
--target_lang es \
--model_name alirezamsh/small100
python3 translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.mbart.txt \
--source_lang hi_IN \
--target_lang fr_XX \
--model_name facebook/mbart-large-50-many-to-many-mmt
Sorry, I just saw that small100 needs a custom tokenizer.py file. I will try to update Easy-Translate to support it.
The last commit adds support for Small100 You can use the example scripts here
I tested both of them
Let me know if have any problems running those models
In some (english-centric or low ressource) cases other models could yield better results, e.g. mBART or SMaLL100: https://huggingface.co/facebook/mbart-large-50-many-to-one-mmt https://huggingface.co/alirezamsh/small100 has another tokenizer Could we include them?