python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.29k stars 432 forks source link

Upgrade to CLDR 43 #1043

Closed rix0rrr closed 7 months ago

rix0rrr commented 7 months ago

This upgrades the CLDR database to release 43.

I tried upgrading to 44 initially, but there seemed to be a lot of breaking changes in there that I didn't know how to deal with, and the locales I'm personally interested in are already supported in 43.

I had to make two additional changes in addition to updating the import:

One doctest, regular spaces have been replaced by thin spaces

In the importing of parentLocales. The new supplementalData.xml contains a new type of declaration:


    <parentLocales component="collations">
         ...
        <parentLocale parent="sr_ME" locales="sr_Cyrl_ME"/>
        ....
    </parentLocales>

This refers to a locale called sr_ME, but there is no .xml file for it and so no .dat file gets generated either. At runtime, when the locale for sr_Cyrl_ME is looked up, we try to merge the data from sr_ME into it and then get a FileNotFound exception when sr_ME.dat doesn't exist.

From the description of this new feature:

parentLocales.json now has new keys for collations and segmentations parent information (CLDR-16425) (https://cldr.unicode.org/index/downloads/cldr-43)

I figured that since this type of information is new and CLDR-42 didn't have it yet, it wouldn't hurt to just ignore it for now. We don't get the benefit of the new inheritable information, but we don't break either and at least we'll be able to consume the new core data for new locales.


This change adds support for the following new locales:

aa, aa_DJ, aa_ER, aa_ET, ab, ab_GE, an, an_ES, apc, apc_SY, arn, arn_CL,
az_Arab, az_Arab_IQ, az_Arab_IR, az_Arab_TR, ba, ba_RU, bal, bal_Arab,
bal_Arab_PK, bal_Latn, bal_Latn_PK, bgn, bgn_AE, bgn_AF, bgn_IR, bgn_OM, bgn_PK,
blt, blt_VN, bm_Nkoo, bm_Nkoo_ML, bss, bss_CM, byn, byn_ER, cad, cad_US, cch,
cch_NG, cho, cho_US, cic, cic_US, co, co_FR, cu, cu_RU, dv, dv_MV, el_POLYTON,
en_Dsrt, en_Dsrt_US, en_Shaw, en_Shaw_GB, gaa, gaa_GH, gez, gez_ER, gez_ET, gn,
gn_PY, ha_Arab, ha_Arab_NG, ha_Arab_SD, hnj, hnj_Hmnp, hnj_Hmnp_US, io, io_001,
iu, iu_CA, iu_Latn, iu_Latn_CA, jbo, jbo_001, kaj, kaj_NG, kcg, kcg_NG, ken,
ken_CM, kpe, kpe_GN, kpe_LR, la, la_VA, lij, lij_IT, lmo, lmo_IT, mn_Mong,
mn_Mong_CN, mn_Mong_MN, mni_Mtei, mni_Mtei_IN, moh, moh_CA, ms_Arab, ms_Arab_BN,
ms_Arab_MY, mus, mus_US, myv, myv_RU, nqo, nqo_GN, nr, nr_ZA, nso, nso_ZA, nv,
nv_US, ny, ny_MW, osa, osa_US, pap, pap_AW, pap_CW, prg, prg_001, quc, quc_GT,
rhg, rhg_Rohg, rhg_Rohg_BD, rhg_Rohg_MM, rif, rif_MA, sat_Deva, sat_Deva_IN,
scn, scn_IT, sdh, sdh_IQ, sdh_IR, shn, shn_MM, shn_TH, sid, sid_ET, sma, sma_NO,
sma_SE, smj, smj_NO, smj_SE, ss, ss_SZ, ss_ZA, ssy, ssy_ER, st, st_LS, st_ZA,
syr, syr_IQ, syr_SY, szl, szl_PL, tig, tig_ER, tn, tn_BW, tn_ZA, tpi, tpi_PG,
trv, trv_TW, trw, trw_PK, ts, ts_ZA, ve, ve_ZA, vec, vec_IT, vo, vo_001, wa,
wa_BE, wal, wal_ET, wbp, wbp_AU
codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (65de3dc) 89.82% compared to head (f337ffd) 90.98%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1043 +/- ## ========================================== + Coverage 89.82% 90.98% +1.16% ========================================== Files 25 25 Lines 4391 4393 +2 ========================================== + Hits 3944 3997 +53 + Misses 447 396 -51 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

rix0rrr commented 7 months ago

Thanks! Once released, this will be very helpful for https://github.com/hedyorg/hedy!

(Speaking of, any idea when that release might be? 😇🙏)