python-babel / babel

The official repository for Babel, the Python Internationalization Library
BSD 3-Clause "New" or "Revised" License
1.29k stars 432 forks source link

Test failures on babel version 2.14.0 #1059

Closed nileshpatra closed 1 month ago

nileshpatra commented 5 months ago

Overview Description

While upgrading the Debian package to latest version I am observing a bunch of test failures on some locales due to minor changes. I'm not sure if the expected output should be changed for these assertions.

Steps to Reproducewhere I have no idea

if it makes sense to simply skip/patch.

Run the test suite with: LC_ALL=C py.test-3

Actual Results

=================================== FAILURES ===================================
______________ [doctest] babel.dates.DateTimeFormat.format_period ______________
1505         u'iltapäivä'
1506         >>> format.format_period('B', 4)
1507         u'iltapäivällä'
1508         >>> format.format_period('B', 5)
1509         u'ip.'
1511         >>> format = DateTimeFormat(datetime(2022, 4, 28, 6, 27), 'zh_Hant')
1512         >>> format.format_period('a', 1)
1513         u'上午'
1514         >>> format.format_period('b', 1)
UNEXPECTED EXCEPTION: ValueError('Could not format period morning1 in zh_Hant')
Traceback (most recent call last):
  File "/usr/lib/python3.11/", line 1353, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest babel.dates.DateTimeFormat.format_period[9]>", line 1, in <module>
  File "/<<PKGBUILDDIR>>/babel/", line 1535, in format_period
    raise ValueError(f"Could not format period {period} in {self.locale}")
ValueError: Could not format period morning1 in zh_Hant
/<<PKGBUILDDIR>>/babel/ UnexpectedException
__________________ [doctest] babel.numbers.format_scientific ___________________
954 Return value formatted in scientific notation for a specific locale.
956     >>> format_scientific(10000, locale='en_US')
957     u'1E4'
958     >>> format_scientific(10000, locale='ar_EG', numbering_system='default')

/<<PKGBUILDDIR>>/babel/ DocTestFailure
________________ [doctest] babel.numbers.get_exponential_symbol ________________
416 Return the symbol used by the locale to separate mantissa and exponent.
418     >>> get_exponential_symbol('en_US')
419     u'E'
420     >>> get_exponential_symbol('ar_EG', numbering_system='default')

/<<PKGBUILDDIR>>/babel/ DocTestFailure
__________________ [doctest] babel.units.format_compound_unit __________________
242     >>> format_compound_unit(1234.5, "ton", 15, denominator_unit="hour", locale="ar_EG", numbering_system="arab")
243     '1٬234٫5 طن لكل 15 ساعة'
245     >>> format_compound_unit(160, denominator_unit="square-meter", locale="fr")
246     '160 par m\xe8tre carr\xe9'
248     >>> format_compound_unit(4, "meter", "ratakisko", length="short", locale="fi")
249     '4 m/ratakisko'
251     >>> format_compound_unit(35, "minute", denominator_unit="fathom", locale="sv")
    '35 minuter per famn'
    '35 minuter per length-fathom'

/<<PKGBUILDDIR>>/babel/ DocTestFailure
______________________ FormatDecimalTestCase.test_compact ______________________

self = <tests.test_numbers.FormatDecimalTestCase testMethod=test_compact>

    def test_compact(self):
        assert numbers.format_compact_decimal(1, locale='en_US', format_type="short") == '1'
        assert numbers.format_compact_decimal(999, locale='en_US', format_type="short") == '999'
        assert numbers.format_compact_decimal(1000, locale='en_US', format_type="short") == '1K'
        assert numbers.format_compact_decimal(9000, locale='en_US', format_type="short") == '9K'
        assert numbers.format_compact_decimal(9123, locale='en_US', format_type="short", fraction_digits=2) == '9.12K'
        assert numbers.format_compact_decimal(10000, locale='en_US', format_type="short") == '10K'
        assert numbers.format_compact_decimal(10000, locale='en_US', format_type="short", fraction_digits=2) == '10K'
        assert numbers.format_compact_decimal(1000000, locale='en_US', format_type="short") == '1M'
        assert numbers.format_compact_decimal(9000999, locale='en_US', format_type="short") == '9M'
        assert numbers.format_compact_decimal(9000900099, locale='en_US', format_type="short", fraction_digits=5) == '9.0009B'
        assert numbers.format_compact_decimal(1, locale='en_US', format_type="long") == '1'
        assert numbers.format_compact_decimal(999, locale='en_US', format_type="long") == '999'
        assert numbers.format_compact_decimal(1000, locale='en_US', format_type="long") == '1 thousand'
        assert numbers.format_compact_decimal(9000, locale='en_US', format_type="long") == '9 thousand'
        assert numbers.format_compact_decimal(9000, locale='en_US', format_type="long", fraction_digits=2) == '9 thousand'
        assert numbers.format_compact_decimal(10000, locale='en_US', format_type="long") == '10 thousand'
        assert numbers.format_compact_decimal(10000, locale='en_US', format_type="long", fraction_digits=2) == '10 thousand'
        assert numbers.format_compact_decimal(1000000, locale='en_US', format_type="long") == '1 million'
        assert numbers.format_compact_decimal(9999999, locale='en_US', format_type="long") == '10 million'
        assert numbers.format_compact_decimal(9999999999, locale='en_US', format_type="long", fraction_digits=5) == '10 billion'
        assert numbers.format_compact_decimal(1, locale='ja_JP', format_type="short") == '1'
        assert numbers.format_compact_decimal(999, locale='ja_JP', format_type="short") == '999'
        assert numbers.format_compact_decimal(1000, locale='ja_JP', format_type="short") == '1000'
        assert numbers.format_compact_decimal(9123, locale='ja_JP', format_type="short") == '9123'
        assert numbers.format_compact_decimal(10000, locale='ja_JP', format_type="short") == '1万'
>       assert numbers.format_compact_decimal(1234567, locale='ja_JP', format_type="long") == '123万'

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
babel/ in format_compact_decimal
    compact_format = locale.compact_decimal_formats[format_type]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <babel.localedata.LocaleDataDict object at 0x7f4026072950>, key = 'long'

    def __getitem__(self, key: str | int | None) -> Any:
>       orig = val = self._data[key]
E       KeyError: 'long'

babel/ KeyError
_________________________ test_get_exponential_symbol __________________________

    def test_get_exponential_symbol():
        assert numbers.get_exponential_symbol('en_US') == 'E'
        assert numbers.get_exponential_symbol('en_US', numbering_system="latn") == 'E'
        assert numbers.get_exponential_symbol('en_US', numbering_system="default") == 'E'
        assert numbers.get_exponential_symbol('ja_JP') == 'E'
        assert numbers.get_exponential_symbol('ar_EG') == 'E'
>       assert numbers.get_exponential_symbol('ar_EG', numbering_system="default") == 'اس'
E       AssertionError: assert 'أس' == 'اس'
E         - اس
E         + أس

tests/ AssertionError
____________________ test_format_currency_long_display_name ____________________

    def test_format_currency_long_display_name():
        assert (numbers.format_currency(1099.98, 'USD', locale='en_US', format_type='name')
                == '1,099.98 US dollars')
        assert (numbers.format_currency(1099.98, 'USD', locale='en_US', format_type='name', numbering_system="default")
                == '1,099.98 US dollars')
        assert (numbers.format_currency(1099.98, 'USD', locale='ar_EG', format_type='name', numbering_system="default")
                == '1٬099٫98 دولار أمريكي')
        assert (numbers.format_currency(1.00, 'USD', locale='en_US', format_type='name')
                == '1.00 US dollar')
        assert (numbers.format_currency(1.00, 'EUR', locale='en_US', format_type='name')
                == '1.00 euro')
        assert (numbers.format_currency(2, 'EUR', locale='en_US', format_type='name')
                == '2.00 euros')
        # This tests that '{1} {0}' unitPatterns are found:
>       assert (numbers.format_currency(1, 'USD', locale='sw', format_type='name')
                == 'dola ya Marekani 1.00')
E       AssertionError: assert '1.00 dola ya Marekani' == 'dola ya Marekani 1.00'
E         - dola ya Marekani 1.00
E         ?                 -----
E         + 1.00 dola ya Marekani
E         ? +++++

tests/ AssertionError
____________________________ test_format_scientific ____________________________

    def test_format_scientific():
        assert numbers.format_scientific(10000, locale='en_US') == '1E4'
        assert numbers.format_scientific(10000, locale='en_US', numbering_system="default") == '1E4'
        assert numbers.format_scientific(4234567, '#.#E0', locale='en_US') == '4.2E6'
        assert numbers.format_scientific(4234567, '0E0000', locale='en_US') == '4.234567E0006'
        assert numbers.format_scientific(4234567, '##0E00', locale='en_US') == '4.234567E06'
        assert numbers.format_scientific(4234567, '##00E00', locale='en_US') == '42.34567E05'
        assert numbers.format_scientific(4234567, '0,000E00', locale='en_US') == '4,234.567E03'
        assert numbers.format_scientific(4234567, '##0.#####E00', locale='en_US') == '4.23457E06'
        assert numbers.format_scientific(4234567, '##0.##E00', locale='en_US') == '4.23E06'
        assert numbers.format_scientific(42, '00000.000000E0000', locale='en_US') == '42000.000000E-0003'
>       assert numbers.format_scientific(0.2, locale="ar_EG", numbering_system="default") == '2اس\u061c-1'
E       AssertionError: assert '2أس\u061c-1' == '2اس\u061c-1'
E         - 2اس؜-1
E         ?  ^
E         + 2أس؜-1
E         ?  ^

tests/ AssertionError
______________________ TestFormat.test_format_scientific _______________________

self = <tests.test_support.TestFormat object at 0x7f4027595650>

    def test_format_scientific(self):
        assert support.Format('en_US').scientific(10000) == '1E4'
        assert support.Format('en_US').scientific(Decimal("10000")) == '1E4'
>       assert support.Format('ar_EG', numbering_system="default").scientific(10000) == '1اس4'
E       AssertionError: assert '1أس4' == '1اس4'
E         - 1اس4
E         ?  ^
E         + 1أس4
E         ?  ^

tests/ AssertionError

Expected Results

All tests should pass

Additional Information

Version info:

python3: 3.12.1 pytest: 7.4.4 tz: 2023.3.post1-2 freezegun: 1.2.1 unicode-cldr-core: 44-0.1 tzdata: 2023d-1

Alex-ley-scrub commented 4 months ago

I noticed some similar issues with ssf:

stemming from:

Locale.number_symbols will now have first-level keys for each numbering system. Since the implicit default numbering system still is "latn", what had previously been e.g. Locale.number_symbols['decimal'] is now Locale.number_symbols['latn']['decimal'].

akx commented 4 months ago

@nileshpatra Considering all of our tests are green on master here, sounds like the Debian build is doing something differently. Can you share some verbose logs or such?

@Alex-ley-scrub That's unrelated – but as mentioned in the changelog, the format of .number_symbols has changed from 2.13 to 2.14 to allow for other numbering systems than Latin. number_symbols's documentation has had an admonition that the format may change between Babel versions since 2016, and now it did 😄

nileshpatra commented 4 months ago

Hi @akx

Considering all of our tests are green on master here, sounds like the Debian build is doing something differently.

I suspect this has got something to do with tzdata version and the changes thereof. Is is possible to know what version of tzdata the CI pulls in? In debian it is 2023d-1 for the log that I linked to.

Can you share some verbose logs or such?

Will py.test-3 --verbose help you here?

akx commented 4 months ago

As far as I can see, none of the errors above should be related to tzdata, but the CLDR data. Are you sure you're pulling and converting the correct CLDR data (make import-cldr)?

nileshpatra commented 4 months ago

I think so - we are using babel's tarball directly off github releases which has .dat files processed already. We don't have to pull and convert at our end - do we?

akx commented 4 months ago

@nileshpatra Um... what tarball is that? The GitHub release for 2.14.0 has no sdist TAR.

nileshpatra commented 4 months ago

@akx oops, seems like I gave an incorrect response w/o properly checking - sorry for that! You're right indeed, there's no sdist.

In debian, we generate .dat files via: python3 scripts/ /usr/share/unicode/cldr/common and the version of unicode-cldr-core in debian is 44.0 while babel pulls 43.0 as per

I suppose this is the difference -- do you think babel can be adapted to latest CLDR data?

akx commented 3 months ago

@nileshpatra Sure, the work can be done to have Babel use CLDR 44, but that would be for Babel 2.15. Babel 2.14 uses CLDR 43 (

nileshpatra commented 3 months ago

Ack, I will wait for a new release then

akx commented 1 month ago

The freshly released Babel 2.15.0 uses CLDR 44. 🎉

The next version will use CLDR 45 when #1077 gets merged.