keymanapp / api.keyman.com

https://api.keyman.com/ source
3 stars 3 forks source link

Consume langtags.json instead of other data sources #44

Closed mcdurdin closed 4 years ago

mcdurdin commented 4 years ago

langtags.json is the replacement for alltags and should contain enough information to allow us not to import most of the other data sources we currently reference.

Note also this from @mhosken:

We now have a staging for langtags.json, using the same &staging=1 (or &staging=true).

I have added two new fields to the langtags.json specifically to help Keyman. The "windows" field gives a form of the tag that is appropriate for use on Windows where the only script field that can be removed is for languages with an appropriates suppress-script field in the IANA registry. I have also added that flag as a "suppress" field.

These changes are simple additions and therefore cause a minimal version bump.

Other changes in the staging data set is the addition of new iso639 codes recently approved.

I am waiting on a new set of Ethnologue data before we are really ready to release. So I'm not ready to flick the switch quite yet. But this is certainly ready for testing.

Note also: this yet to be released public document from MS:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/70feba9f-294e-491e-b6eb-56532684c37f?redirectedfrom=MSDN which puts the whole addition of a windows field into question.

mcdurdin commented 4 years ago

Implementation notes:

As I understand it, you need two language tags for a keyboard: the best tag and the best windows tag.

As per https://github.com/silnrsi/langtags/blob/master/doc/langtags.md

Each record in the langtags.json contains a set of tags that are considered equivalent. I call these a tagset. Each tagset has a bunch of fields.

  1. To lookup a tag test all of .tag, .full and .tags for the tag.
  2. If it fails then you may want to test the language component against the .iso639_3 field and then replace with the corresponding 2 letter tag and re-search.
  3. If you have a region to test then you can also test against all the regions in the region list.

For example code on how to do all this, see lib/langtag/init.py

Now you have a tagset you can look up things like the .tag to get the best tag and .script for the script to use, then if the .tag has no script you can add one if the .suppress is not true (yet to be added) to get the Windows tag.

mcdurdin commented 4 years ago

See macrolanguages discussion at https://github.com/keymanapp/keyboards/pull/1077#issuecomment-595569766 also

mcdurdin commented 4 years ago

See also notes in keymanapp/keyboards#917

mcdurdin commented 4 years ago

Complete in staging.

mcdurdin commented 4 years ago

See also #79 for custom name and custom tag support.