wooorm / franc

Natural language detection
https://wooorm.com/franc/
MIT License
4.12k stars 173 forks source link

Only 3 letter code ISO 639 supported #55

Closed Bramzor closed 7 years ago

Bramzor commented 7 years ago

As I and a lot of people use ISO 639-1 as a default way to define languages, it would be handy if there was an option to select this.

wooorm commented 7 years ago

Hi @Bramzor.

That’s not possible. See GH-19 for more details.

Bramzor commented 7 years ago

Aren't 639-3 more or less dialects? I always used "locales" before like en_US etc which is common, those are never 3 characters long. So it gets a lot more complex if 639-3 is used instead of 639-1. I understand your reasoning why it is complex to do as the current version is created for 639-3, but I would prefer a way to easily fix this in your code instead of forking the code. In the end 639-1 can be seen as a lower resolution of the 639-3 table? So it should be possible to go from 639-3 to 639-1. Other way wouldn't be possible.

Would it be ok to just introduce a new function inside franc that in worst case, uses a different lib to guess the 639-1 version?

I think it makes sense as there are already multiple people that raised the question before :-)

Bramzor commented 7 years ago

We could use the following library: https://github.com/adlawson/nodejs-langs langs.where("3", "kor")["1"]

Would translate ISO 639-3 to 639-1

wooorm commented 7 years ago

Aren't 639-3 more or less dialects?

No.

I always used "locales" before like en_US etc which is common, those are never 3 characters long.

Those are called BCP-47 tags, in this case with a language part (en, which may also be ISO 639-3) and a region indicator (US).

So it gets a lot more complex if 639-3 is used instead of 639-1. I understand your reasoning why it is complex to do as the current version is created for 639-3, but I would prefer a way to easily fix this in your code instead of forking the code.

The way to ease this pain is by continuing work on GH-30. Feel free to start work on that if you’d like this feature.

In the end 639-1 can be seen as a lower resolution of the 639-3 table? So it should be possible to go from 639-3 to 639-1. Other way wouldn't be possible.

Correct! Similar to ASCII vs Unicode!