xhluca / dl-translate

Library for translating between 200 languages. Built on 🤗 transformers.
https://xhluca.github.io/dl-translate/
MIT License
436 stars 47 forks source link

Incorporating ISO-639 #12

Closed xhluca closed 2 years ago

xhluca commented 3 years ago

Might be worth considering ISO-639-1, ISO-639 Macro. This would have the added benefit of mapping endonyms as well, e.g. dlt.endonym.get("日本語") -> "ja" which would be equivalent to dlt.lang.JAPANESE -> "ja" or something like that.

Some useful links:

xhluca commented 3 years ago

Might also add ISO country codes so we'd be able to cover regional variants (e.g. "fr_CA" vs "fr_FR"). Heres the downloadable version

xhluca commented 2 years ago

Looks like this project covers pretty much what I had in mind already: https://github.com/LBeaudoux/iso639