python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.32k stars 442 forks source link

Allow overriding of CLDR #344

Open jtwang opened 8 years ago

jtwang commented 8 years ago

This encompasses a few different ideas:

I don't know how common this is for the world-at-large, but we require the ability to tweak things such as dates/times and currencies. We don't necessarily want to wait for the Unicode -> CLDR -> babel chain to catch up, we also don't necessarily want to submit our change requests. Eg. We want to show 'US' instead of 'U.S.' (CLDR 23.1)

Having the ability to define new elements would be extremely useful Eg. new datetime skeletons. Or completely new functionality not supported by the CLDR (we added single char currency symbols, which is totally kind of sketchy).

Adding entirely new locale xmls could also come in handy - CLDR 23.1 did not include xmls for en_CH, en_MY, and en_PH, which we needed.

jtwang commented 8 years ago

The way we implemented this feature is as follows:

Interesting design issues:

Here's a snippet of how our setup looks, keep in mind we're still on babel 1.3:

babel \
    cldr \
        common \
            <the usual>
        custom \
            main \
                cs.xml
                de.xml
                root.xml      
        merged \
            <the usual>

Example custom file contents (our script requires every override be linked to our ticket tracking system):

=== ROOT.XML
<?xml version='1.0' encoding='UTF-8'?>
<ldml>
    <dates>
        <calendars>
            <calendar type="gregorian">
                <dateTimeFormats>
                    <availableFormats>
                        <dateFormatItem id="MMMEEEd" ticket="INTL-2668">EEE, d MMM</dateFormatItem>
                    </availableFormats>
                </dateTimeFormats>
            </calendar>
        </calendars>
    </dates>
    <numbers>
        <currencies>
            <currency type="PHP">
                <symbol ticket="INTL-2668">₱</symbol>
            </currency>
        </currencies>
    </numbers>
</ldml>
=== DE.XML
<?xml version='1.0' encoding='UTF-8'?>
<ldml>
    <localeDisplayNames>
        <languages>
            <language ticket="INTL-2947" type="nb">Norwegisch</language>
        </languages>
        <territories>
            <territory ticket="INTL-1441" type="HK">Hongkong</territory>
        </territories>
    </localeDisplayNames>
    <dates>
        <calendars>
            <calendar type="gregorian">
                <months>
                    <monthContext type="format">
                        <monthWidth type="abbreviated">
                            <month ticket="INTL-3218" type="5">Mai.</month>
                            <month ticket="INTL-3218" type="6">Juni.</month>
                            <month ticket="INTL-3218" type="7">Juli.</month>
                        </monthWidth>
                    </monthContext>
                </months>
                <dateTimeFormats>
                    <availableFormats>
                        <dateFormatItem id="MMMEEEd" ticket="INTL-2668">EEE, d. MMM</dateFormatItem>
                    </availableFormats>
                </dateTimeFormats>
            </calendar>
        </calendars>
    </dates>
</ldml>
=== DE_AT.XML
<?xml version='1.0' encoding='UTF-8'?>
<ldml>
    <dates>
        <calendars>
            <calendar type="gregorian">
                <months>
                    <monthContext type="format">
                        <monthWidth type="abbreviated">
                            <month type="1" ticket='INTL-3218'>Jän.</month>
                        </monthWidth>
                    </monthContext>
                </months>
                <dateTimeFormats>
                    <availableFormats>
                        <dateFormatItem id="MMMEEEd" ticket="INTL-2668">EEE, d. MMM</dateFormatItem>
                    </availableFormats>
                </dateTimeFormats>
            </calendar>
        </calendars>
    </dates>
</ldml>
akx commented 8 years ago

Thanks for the extensive overview of how you guys do things!

My first gut feeling is that requiring LDML XMLen for overrides is Not A Good Idea, mostly owing to the overhead required in parsing XML and so on -- for good or for worse, it'd mean moving the CLDR importer into the core library.

Even if our Python-native format is a moving target, I don't think it has had significant overhauls in a long while -- only additions, pretty much, so I think adding an overlay on top of that would be a lighter, neater approach.

That said, perhaps one way to go about this would be to add a patch hook that would allow clients to modify the locale data as it is loaded:

@babel.register_locale_patch
def patch_php_currency(locale, data):
   # (Could check for the locale's properties here)
   data["currencies"]["PHP"]["symbol"] = "₱"

or similar. This feels like a very non-invasive hook to me, though the onus to keep up with possible changes to the Pythonic locale data format would be with those using the patch system. This would also allow "advanced" users as your org to perhaps load the actual override/overlay data from XML or MongoDB or whatever, while still allowing less enterprise users to ad-hoc patch as required. (As a con, this adds a layer of process-global context, which feels a little unclean... Though I think that might just be acceptable.)

Just my first 5 EUR cents here. What do you think?

EDIT: For the UC of

Allow addition of xml files for new locales

the data parameter passed to a patch function might be None or {} (whichever feels like the nicer protocol), and an "unknown locale" error would only be raised for a non-fuzzy locale if the final data is falsy.

jtwang commented 8 years ago

That was actually more-or-less our approach before we forked Babel and ended up with almost 1000 lines of overrides. D:

Mostly due to new datetime skeletons and currency symbol overrides. Protip: heavily encourage your design team to stick to a small set of (supported) date time display formats.

Anyhow, we decided to take the XML approach for a couple of reasons:

That being said, requiring users to build their own package is a huge disadvantage and, as you mentioned, this approach would allow us to override everything, so we could still keep the XML override approach. Most users would probably only require a few overrides at the locale level.

In a nutshell, I'm OK with the patch approach. :)

akx commented 8 years ago

Writing an utility to convert your patch XMLen to Python patch statements doesn't sound like an impossible approach, really. (As a matter of fact, I was thinking a "generic" patcher using yaml/toml: currencies.PHP.symbol = "₱" or something...)

Maybe now that we have actual support for datetime skellingtons, you wouldn't need 1000 lines of overrides? :)

Also, re the fallbacks -- if the signature of the patch function is locale, data, you can easily inspect the locale's language/whatever spec to see what you need to patch. (And if we incorporate @etanol's patch to fold overrides into normal dicts at load time, we don't need the LocaleDataDict funny business either?)

olejorgenb commented 1 year ago

Any news on this?

I noticed that it is possible to modify a Locale instance to some degree. Is this discouraged? eg.:

locale = Locale("no")
locale.datetime_formats.short_date_format = 'yyyy-MM-dd'

Reading the code it seems each instance get a copy of the data, so I assume this will work ok for now as long I ensure all parts of the application use this modified locale instance. But the API does not exactly invite such overrides.