schourode / iso3166

ISO 3166-1 country list for .NET
MIT License
76 stars 31 forks source link

Localization #15

Closed Techek closed 2 years ago

Techek commented 4 years ago

Would it be possible to join forces with https://github.com/srollinet/Countries which is working on localizing ISO-3166?

andersnm commented 4 years ago

Hi!

Having localized country names sounds useful, although it would be better with a more complete data source to start with.

After a bit of googling, this project "ICU" appears to be pretty popular on many platforms/languages for internationalization purposes: http://site.icu-project.org/home

And there is a managed .NET wrapper for it on Nuget: https://www.nuget.org/packages/icu.net/

Would be interesting if we could use their data somehow to provide a fully managed .NET alternative. Perhaps write a custom tool to generate code with localized country names before or as part of the build. Happy to accept a PR :-)

(btw found ICU via https://github.com/umpirsky/country-list which uses https://github.com/symfony/intl in PHP, which uses ICU)

Techek commented 4 years ago

Repository https://github.com/umpirsky/country-list actually looks more complete than https://github.com/srollinet/Countries.

I have no idea how to access the ICU-project apart from doing a primitive scrape of specific pages and somehow integrate the scraped result into your repository.

What can I do?

andersnm commented 4 years ago

Hi,

FYI, I'm busy with personal stuff and other projects for a while, so I'm not able to suggest anything well thought out yet. Some random thoughts however:

We could automate importing much of the country names by creating a .NET console app which depends on the icu.net and Icu4c.Win.Min Nuget packages and dumps the country names into one or more .cs files.

Then, next step is to devise the public API how to return the localized country names, either by just exposing plain List<>s, Dictionary<>s or some kind of GetCountryName() helper method taking a CultureInfo parameter.

We can then add a note in the readme "Localized country names from ICU vX.Y". Updating the country names would be a matter of bumping the version of the Icu4c.Win.Min binaries.

If you can help with any of the above, then great, otherwise it'll remain in the backlog for an undetermined amount of time.

andersnm commented 4 years ago

So I found this quite interesting and did some more digging. Turns out ICU is included in latest Windows 10: https://docs.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-

Thus the following can be used to get a list of localized country names without any external dependencies:


        [DllImport("icuuc")]
        private static extern IntPtr uldn_open(string locale, int dialectHandling, out IntPtr errorCode);

        [DllImport("icuuc")]
        private static extern IntPtr uldn_regionDisplayName(IntPtr handle, string region, [MarshalAs(UnmanagedType.LPWStr)]StringBuilder result, int maxResultSize, out IntPtr errorCode);

        [DllImport("icuuc")]
        private static extern IntPtr uldn_close(IntPtr handle);

        public static Dictionary<string, string> GetCountryNames(string locale, List<string> countryCodes)
        {
            var handle = uldn_open(locale, 0, out var errorCode);

            var countryName = new StringBuilder(255);
            var result = new Dictionary<string, string>();
            foreach (var countryCode in countryCodes)
            {
                uldn_regionDisplayName(handle, countryCode, countryName, 255, out errorCode);
                result.Add(countryCode, countryName.ToString());
                countryName.Clear();
            }

            uldn_close(handle);
            return result;
        }

Based on the above (plus 3 more APIs), I went ahead and implemented a code generator which produces a 1.8mb cs file with all country names in all locales. When included in the ISO3166 library, the resulting binary size increases to ca 2mb from 25k. Looks like a viable approach, although the file size jump is a bit concerning. Still got a few locale normalization issues to sort out.

Another approach entirely is the ICU4N project: https://github.com/NightOwl888/ICU4N which appears to effectively replace the ISO3166 library with added localization features. It's a partial port of ICU4J, didn't check if the localized country lists are ported yet.