Define scope - Githubissues

westnordost commented 5 months ago

You put some extra information into the Language and Country enum, such as dialling code and in which countries which languages are official.

There is all kinds of extra information could be added per country or per language. But the more is added, the more of a maintenance effort it is going to be for you if this library gains popularity, usage (and feature requests).

You should define the scope somewhere what should be included and what should be out of scope of this library. Because, There is a whole project by the Unicode Consortium about language-specific metadata, the Common Locale Data Repository, with frequent updates.

For country-specific metadata, I expect there must be a similar data-project out there, didn't look.

As a matter of fact, for a Kotlin multiplatform project of mine, I am looking not only for a multiplatform Locale library to (primarily) get the display name of countries and languages, but also need to do locale-specific formatting of decimals (different decimal separators are used in different languages), dates, weekdays, times. Just like Locale::getDisplayName(), Java has this included, for Kotlin Multiplatform, there is no replacement, yet. But I do recognize that this should probably be separate from a library. But what should go into that supposed separate library and what should go here?

By the way, what languages are official in any one country seem to be rather a property of the country, not the language. (E.g. the language didn't change after East Germany was merged into West Germany and hence East Germany ceased to exist, but in your data structure, it would.)

I am maintaining a list of languages by country code too and it is quite difficult to get to a useful yet well defined data set as I can say that it is often very difficult to define. E.g. the United States do not have an official language. Some countries have an official language that the majority of people in that country do not speak. Some countries have dozens of official languages for political reasons (e.g. Bolivia). Some others do not recognize a language as official also for political reasons even if it is spoken by a majority (at least in a province).

vanniktech commented 5 months ago

You should define the scope somewhere what should be included and what should be out of scope of this library.

I maintain a list of languages that are supported in the app (Settings -> Languages), with the ability to change them. Since I know which languages I have this is hardcoded (regarding #182) and then later in one of my apps I wanted a list of Countries on both Android as well as iOS and from there it gradually grew.

Because, There is a whole project by the Unicode Consortium about language-specific metadata, the Common Locale Data Repository, with frequent updates.

Interesting, I didn't know about this. Could we pull this XML data and generate Kotlin Code? Then we would not have to update things manually. I already do something similar in a different library: https://github.com/vanniktech/Emoji/tree/master/generator (There's a javascript catalogue of all emojis, I parse it and generate Kotlin Code which can be used on JVM + Android. iOS is lacking since there's no good unicode support within Kotlin Native)

but also need to do locale-specific formatting of decimals (different decimal separators are used in different languages), dates, weekdays, times

I am not really using this yet. I do have only one case where I format kilometers but this I expect/actualed outside of this library. But in general, I'd be open to also support this.

By the way, what languages are official in any one country seem to be rather a property of the country, not the language.

That's right. Since the list of languages isn't exhaustive, I chose to go the way of using the Language so that you are at least required to add it when you create a language and I won't forget it. We could change this though.

I am maintaining a list of languages by country code too and it is quite difficult to get to a useful yet well defined data set as I can say that it is often very difficult to define. E.g. the United States do not have an official language. Some countries have an official language that the majority of people in that country do not speak. Some countries have dozens of official languages for political reasons (e.g. Bolivia). Some others do not recognize a language as official also for political reasons even if it is spoken by a majority (at least in a province).

Yeah for most of it I just chose common sense. I wanted this feature in my Flashcards app, also Android & iOS that when you create a deck, you can specify the locale of the front / back part so when you use TTS Apis it gets pronounced correctly. Since you specify the language first, I probably also chose to append the Country list onto Language, but I don't remember.

Also up until now there hasn't been much activity regarding this library and you are the first to really show an interest. Neither do I know how many are actually using it.

westnordost commented 5 months ago

Interesting, I didn't know about this. Could we pull this XML data and generate Kotlin Code?

I would say so, yes. Thinking aloud:

But well, I had a short look at it. The data is massive. They also have a JSON variant of the data and that data is 66MB, zipped. Sure, once could take just parts of it. What I am saying is that due to the volume and generality of this data, it might be somewhat complex to parse it.

And also due to the volume of this data, maybe the better approach would be for the environment to supply this kind of Locale-specific information. With environment, I mean JVM, iOS, Javascript runtime, ... as it is not feasible to include this data in a single library. Then again, even for Kotlin folks it might not be feasible to include (at one point in time in the future). Maybe instead, instead of using CLDR directly, it might be possible to use this data indirectly, through whatever is exposed in the runtimes. E.g. the display name of country and language in a given language is exposed, as you already know.

Maybe there is more... on the other hand..., much of the data will not be directly accessible, e.g. which decimal separator to use in which Locale. In conclusion, Kotlin KMP developers are in a pickle... maybe pull some data from CLDR (decimal separator, date format, ...) while getting other data, especially translation data (language names, country names, weekday names, month names, ...) from the runtimes, if possible...

vanniktech commented 5 months ago

Whops. Yes that's massive. We should be careful of what to include. This is also why for now I defer to the System to translate the country / name as you really end up with an infeasible number of permutations.

vanniktech / multiplatform-locale

Define scope #183