Closed ivanistheone closed 7 years ago
I'm really hesitant on changing the language codes because the language code is used as the model id. Changing the language codes could then break the content that uses 3-letter codes as the corresponding id would no longer exist. Also, certain languages have 3-letter codes because the 2-letter representation has been taken (e.g. Finnish and Filipino), so I'm not sure how cleanly certain languages will be translated to a new 2-letter scheme. An alternative solution could be to update the getlang
method to look for a "closest match" or have some sort of mapping from 2-letter codes to 3-letter codes
@aronasorman Might be good to get your opinion on this too
@jayoshih I see. If we can't change the existing data model, we should aim to provide helper functions as you suggested:
If these lookup functions are the only "public" API, then we can use whatever internal format we want (e.g. keep the existing one).
Related to this, Jamie just posted on slack a much longer list of varied languages (for African Storybook channel) so we might need to also consider an "extensible language" setting.
I think having an API is a good approach, however, I would urge you both to consider that both the data and the API should be accessible from both Python and Javascript, as we will need all of this language data during content render to make decisions about text directionality within content renderers.
Also, for even more languages, c.f. https://www.ethnologue.com/
While working on the youtube subs for the TE chef, @divad12 noticed inconsistencies in the short language codes defined in
le_utils/resources/languagelookup.json
. There is a mix of two letter codes likept-BR
and three letter codeszul
.The conventions are consistent between chefs, cc server, and kolibri, but loading data from external sources can be problematic. For example, a youtube video can provide subtitles for Zulu as
zu
but we need to upload them aszul
to the cc server so that Kolibri will recognize them.Should we consider standardizing on two letter codes? Possibly with a fallback/retro-compatible mode for three letter codes? (sidequestion: What is the
ka_name
used for?)A change to two-letter codes will require revisiting: