huggingface / transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.38k stars 27.09k forks source link

Support NLLB's LID model #18294

Closed xia0nan closed 2 years ago

xia0nan commented 2 years ago

Model description

Thanks for supporting NLLB and closing this issue https://github.com/huggingface/transformers/issues/18043. I'm wondering if huggingface can further support the language identification model of NLLB? "LID (Language IDentification) model to predict the language of the input text."

Open source status

Provide useful links for the implementation

https://github.com/facebookresearch/fairseq/tree/nllb#lid-model

xia0nan commented 2 years ago

I figured out how to use it. No issue now.

xia0nan commented 2 years ago

First download model wget https://dl.fbaipublicfiles.com/nllb/lid/lid218e.bin Then use it for inference

import fasttext
pretrained_lang_model = "lid218e.bin"
model = fasttext.load_model(pretrained_lang_model)
text = "γ“γ‚Œγ€ζ΅…θ‰γ«γ€θ‘ŒγγΎγ™γ‹"
predictions = model.predict(text, k=1) 
print(predictions)