Other/custom locales - Githubissues

hilari0n commented 3 years ago

I know, that the only built-in locale is currently "en". Is there a guide on how to implement and use "custom" locales?

I would imagine distributing those as individual NuGet-s for each locale (e.g. "MessageFormat.jp", "MessageFormat.sp", "MessageFormat.pl") but would need to first understand how would one create those, without altering the core library? Is it possible, or does adding a new one is only possible withing this project (by a PR submission or a fork)? If it's the latter, having an example/guideline would also help.

For now, I have only found how to provide a pluralizer specific to a culture/locale. Is it because there's no such support? E.g. I have found nothing on support for "number" and "date" with "standard" styles ("long", "short", "medium", "full") and/or skeletons even for "en" (although it should be possible to do it in relatively general way), so how would I go about providing it for other locales?

jeffijoe commented 3 years ago

Currently, we only support pluralization which is the only locale-specific thing so you just need to add the locale to the pluralizers collection.

If we are to support more formats that depend on localization then some big changes need to happen.

NightOwl888 commented 3 years ago

@hilari0n

This is a very big can of worms when you consider formatting of dates and numbers. The place to start is at ICU's MessageFormat support.

The only support for it I am aware of in .NET is icu-dotnet's MessageFormatter class. icu-dotnet is a wrapper of .NET bindings for the ICU4C library.

hilari0n commented 3 years ago

@jeffijoe: It may as well be true, that pluralization is the only locale-specific thing. In that case I will probably manage to find a way to implement those for "custom" locales.

As for supporting more formats, I was hoping that I missed something, as "number" and "date" are basically an ICU standard and are supported by MessageFormat.js and those two are mentioned on this project's page as basically defining the functionality here. And in both cases the support for styles is there and having something working for things like "long" or "short" formats should not be so painful, as .net formatting for dates has similar concepts. All in all, for a simple pattern "Date: { value, date }", which is perfectly valid from ICU or MessageFormat.js perspective, I get an exception, instead of some kind of minimum support (where e.g. the "date" and "number" would be recognized by the VariableFormatter as supported (but ignored on actual formatting).

There are some other minor things, which suggest that making it work for other cultures/locales may be problematic, e.g. the Pluralizer uses Convert.ToDouble without any culture information (while it should probably always be the invariant one), which means, that the operation is dependent on current locale (making the format parsing "unstable"), which I have found when considering how to add support for another standard formatter: "selectordinal".

jeffijoe commented 3 years ago

We can add the date and number formatters. I’m open to PRs for them. I don’t exactly know what it entails, I would always use the platform native formatting for dates and numbers myself.

jeffijoe commented 3 years ago

@hilari0n as for having a NuGet package per locale, I think that would be a versioning nightmare. If we modify the abstraction in any way, all of them break and would need to be updated immediately.

On the other hand, having them in core, I don't know if the linker is smart enough to not include locales that aren't used? Maybe that won't be a problem since locales here won't contain too much data, I don't know.

Thoughts?

kostya9 commented 3 years ago

I am thinking of writing the rules for the locales in some form of DSL in an xml/json/some text file. Then, in runtime (or some kind of source generator) can load these locales. If assembly size will be of concern this source generator may accept options to include only a subset of locales.

The file-based approach was inspired from https://github.com/google/libphonenumber, an example of how easy it is to change the configuration https://github.com/google/libphonenumber/pull/2567/files

kostya9 commented 3 years ago

Maybe there is no need to actually type in the data ourselves. One way to do this is for the library to learn how to parse the ICU data files. For example, the data for plural rules seems to be here https://github.com/unicode-org/icu/blob/main/icu4c/source/data/misc/plurals.txt

For optimal performance, the approach with source generators may still apply, and the source generator may be customizable to include only the locales the target user needs for size-critical environments.

jeffijoe commented 3 years ago

Yeah that sounds like a great idea! This is the first time I see that format though, would need to look at how to parse it.

NightOwl888 commented 3 years ago

FYI - ICU gets that data from the CLDR and they convert it to a custom format to use it (primarily to filter out the data they don't want to ship with their product). CLDR supplies the data in XML as the "official" format, but they also maintain a JSON format, both which would be easier to parse than ICU's proprietary format.

jeffijoe commented 3 years ago

Closing since #24 was shipped 🚀

jeffijoe / messageformat.net

Other/custom locales #22