silnrsi / sldr

SIL Locale Data Repository
MIT License
19 stars 11 forks source link

iso 639-1 code #14

Closed hatton closed 4 years ago

hatton commented 5 years ago

It would be helpful to have an field named "iso639_1" for languages that have one.

mhosken commented 5 years ago

There is. It's called region.

GB, Martin

On Sat, 10 Aug 2019, 23:54 John Hatton, notifications@github.com wrote:

It would be helpful to have an field named "iso639_1" for languages that have one.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/silnrsi/sldr/issues/14?email_source=notifications&email_token=ABLMO3L3CHRY2RCLGNLGPTLQD3XFPA5CNFSM4IKZ44L2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HERKPCQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLMO3L7EZBYKFYPNFWP2Y3QD3XFPANCNFSM4IKZ44LQ .

hatton commented 5 years ago

No, there's a field called "region".

There must some insider perspective in which, for example, "en" is the name of the region. But this just the latest example of why this list is SO frustrating to work with.

What would be the harm in simply naming fields by the international standard that they represent, rather than relying on insider info?

mhosken commented 5 years ago

I'm sorry. I had a brain botch. iso639_1 is to do with the language component, not the region. Duh.

Whether a tag has an iso639_1 component or an iso639_3 component is a matter of measuring the length of the language component of the tag field. If it is 2, then it's iso639-1 else iso639-3. I admit that the region and script elements have been extracted as fields, but I am reluctant to do what can be done in 1 or 2 lines of code.

My javascript isn't up to much, but something like:

iso639_1 = (tag.length == 2 || tag.indexOf("-") == 2) ? tag.substring(0, 2) : ""

Which is ugly, but a whole lot less cost than making langtags.json even bigger.

You feel that naming fields by their ISO standard requires less mental work by users than on their semantics? Interesting. I don't think everyone would agree. If others agree. Please comment. Anyway I will ensure the requisite ISO specs are referenced in the docs. Thanks for the pointer.

I feel your pain with the complexities of language tags. They seem so simple in concept, but the deeper you get into them, the more messy they get. langtags.json is an attempt to hide some of that messiness, but it isn't able to do away with it all.

hatton commented 5 years ago

No worries. I often find myself thinking that iso 639-2 is about two letter lang code because, 2, right?

I've come to the conclusion that langtags.json is just not a replacement for what we've been using. It appears to have different aims. All of the software that I'm involved in just basically wants subset of the Ethnologue entries:

{
"iso639_1": "sr",
"iso639_3": "srp",
"englishName": "Serbian",
"localName": "српски",
"altNames": [
    "Serbo-Croatian",
    "Montenegrin",
    ...
    ]
},

The point is to quickly load the data into a data structure optimized for searching. In contrast, langtags is something that you still have to run code over every entry before you can load it.

Next, langtags seems to come from multiple sources and thus errors creep in, e.g. Indonesia should have a localName of "Bahasa Indonesia". Sure, one can report bugs, but it's not feasible for me to review thousands of languages. I need to just say "give me the Ethnologue" and then leave it to their staff to keep everything accurate.

But the most important thing is more fundamental: langtags is not trying to be a list of ISO 639-3 languages (duh). Instead langtags.json about, well, "tags". It's about scripts, variants, etc. As a result, it has multiple entries for some languages (e.g. Spanish has around 30 entries). So it would take even more processing to get it to what we need. I'm going to give you a break now and stop trying to put this beautiful square peg into my old-fashioned round hole :-) Thanks for bearing with me while I thrashed around, Martin.

gtryus commented 5 years ago

I posted code for a Language picker using react-redux. It can easily have a iso_639-3 export. I am still finishing tests / add adding some features. I will be adjusting it but it would be good to be getting feedback. https://github.com/sillsdev/web-transcriber-admin/commits/feature/TT-588-language-picker