ikergarcia1996 / Easy-Translate

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
Apache License 2.0
189 stars 306 forks source link

Add more models #5

Closed Bachstelze closed 1 year ago

Bachstelze commented 1 year ago

In some (english-centric or low ressource) cases other models could yield better results, e.g. mBART or SMaLL100: https://huggingface.co/facebook/mbart-large-50-many-to-one-mmt https://huggingface.co/alirezamsh/small100 has another tokenizer Could we include them?

ikergarcia1996 commented 1 year ago

Hi @Bachstelze, both of them are already supported. You can use them with the following command. I will update the README.md to clarify it.

python3 translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.small100.txt \
--source_lang en \
--target_lang es \
--model_name alirezamsh/small100

python3 translate.py \
--sentences_path sample_text/en.txt \
--output_path sample_text/en2es.translation.mbart.txt \
--source_lang hi_IN \
--target_lang fr_XX \
--model_name facebook/mbart-large-50-many-to-many-mmt
ikergarcia1996 commented 1 year ago

Sorry, I just saw that small100 needs a custom tokenizer.py file. I will try to update Easy-Translate to support it.

ikergarcia1996 commented 1 year ago

The last commit adds support for Small100 You can use the example scripts here

I tested both of them

Let me know if have any problems running those models