thammegowda / nllb-serve

Meta's "No Language Left Behind" models served as web app and REST API
http://rtg.isi.edu/nllb/
151 stars 20 forks source link

Auto detect language feature ? #6

Open Utsaww opened 1 year ago

Utsaww commented 1 year ago

It's a really good project working out of the box, much appreciated man! I was wondering if language auto detection feature is there it will be really helpful.

thammegowda commented 1 year ago

I agree language ID would be a great feature. Many available language ID models (e.g. https://fasttext.cc/docs/en/language-identification.html) recognize fewer than 200 languages (they fall short of NLLB models). Are you familiar with any good lang ID detection models that recognize all 200 languages in NLLB?

Pull requests will be greatly appreciated!!

Utsaww commented 1 year ago

I am working on fasttext Language identification as it has 176 language identification, for now I guess this works, if you are good to go, then I will surely work and create a pull request.

thammegowda commented 1 year ago

@Utsaww yes, please! Sorry I missed replying to this message.

Suggestion on how to integrate: here ... https://github.com/thammegowda/nllb-serve/blob/024f703bb6e3f2ebe59f39cbe7f080e052ab0b80/nllb_serve/app.py#L128

if src_lang == '[auto]':
  src_lang = <lang_id_detection>(sources)