feat: Create parse-langtags utility to convert langtags.json

darcywong00 commented 3 years ago

Fixes #14

The first part of this PR is creating a /DevTool/parse-langtags tool which converts the langtags.json file into a | separated language list that Speech Analyzer uses. (We won't revamp the language picker, so it will continue to use the iso639.txt file). The parse-langtags tool runs from node with

node dist/index.js -i iso639.txt

Note. The previous iso639.txt file had roughly 500 entries, and the langtags.json sourced file now has roughly 2000 entries. This causes a noticeable performance hit on "Exporting to Lift" as the iso639.txt file gets read in. I contemplated only populating the two columns we care about, but decided not to so users on the older versions of Speech Analyzer can just "drop in" the updated iso639.txt file.

Since langtags.json uses BCP-47 tags which can include script, region, and private use variants, the second part of this PR is updating Speech Analyzer code to accommodate this. (SA previously used 2-character country codes and language names). Sa_Doc.cpp now also has to merge how the "phonetic" (fonipa-x-etic) and "phonemic" (fonipa-x-emic) language tags get constructed.

[x] There's still some encoding weirdness in the dropdown boxes. Even though the generated iso639.txt files are UTF-8, I suspect Speech Analyzer's usage of wstring means underlying data is in UCS-2 (not exactly UTF-16?)

Update - This is now displayed correctly with the rewrite of how iso639.txt gets parsed into codes.

Reference https://stackoverflow.com/questions/2527720/confused-about-cs-stdwstring-utf-16-utf-8-and-displaying-strings-in-a-win

megahirt commented 3 years ago

@darcywong00 did you intend to commit DistFiles/Fonts/DoulosSILR.ttf ?

darcywong00 commented 3 years ago

At one point, I was trying to use an updated version of Doulos to render the dropdown box, but that didn't help. I'll revert that change

darcywong00 commented 3 years ago

about x-bad-mru-Cyrl-RU and why we find it in langtags.json

He said it comes from CLDR, so langtags.json keeps it to stay in sync.

sillsdev / SpeechAnalyzer

feat: Create parse-langtags utility to convert langtags.json #24