python / cpython

The Python programming language
https://www.python.org
Other
63.54k stars 30.44k forks source link

locale.nl_langinfo(locale.ERA) does not work for past eras #126727

Open serhiy-storchaka opened 1 week ago

serhiy-storchaka commented 1 week ago

Bug report

According to the Posix specification (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_05_02), nl_langinfo(ERA) should return a string containing semicolon separated era description segments. But in Glibc it uses NUL instead of a semicolon as a separator. As result, locale.nl_langinfo(locale.ERA) in Python only returns the first segment, corresponding to the last (current) era. For example, in Japanese locale the result cannot be used for data before year 2020:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'ja_JP')
'ja_JP'
>>> locale.nl_langinfo(locale.ERA)
'+:2:2020/01/01:+*:令和:%EC%Ey年'

This issue is similar to #124969, but at least the result can be used for the current date.

cc @methane, @kulikjak.

Linked PRs

serhiy-storchaka commented 1 week ago

On my computer (Linux) the following script

import locale, subprocess
alllocales = subprocess.check_output(['locale', '-a']).decode().split()
for loc in alllocales:
    if '.' in loc or '@' in loc:
        continue
    try:
        _ = locale.setlocale(locale.LC_ALL, loc)
    except locale.Error:
        continue
    era = locale.nl_langinfo(locale.ERA)
    if era:
        print(loc, era.count(';'), era)

now produces the following output:

cmn_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
hak_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
ja_JP 10 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年;+:1:1926/12/25:1926/12/31:昭和:%EC元年;+:2:1913/01/01:1926/12/24:大正:%EC%Ey年;+:1:1912/07/30:1912/12/31:大正:%EC元年;+:6:1873/01/01:1912/07/29:明治:%EC%Ey年;+:1:0001/01/01:1872/12/31:西暦:%EC%Ey年;+:1:-0001/12/31:-*:紀元前:%EC%Ey年
japanese 10 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年;+:1:1926/12/25:1926/12/31:昭和:%EC元年;+:2:1913/01/01:1926/12/24:大正:%EC%Ey年;+:1:1912/07/30:1912/12/31:大正:%EC元年;+:6:1873/01/01:1912/07/29:明治:%EC%Ey年;+:1:0001/01/01:1872/12/31:西暦:%EC%Ey年;+:1:-0001/12/31:-*:紀元前:%EC%Ey年
lo_LA 0 +:1:-543/01/01:+*:ພ.ສ.:%EC %Ey
lzh_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
nan_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
thai 0 +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
th_TH 0 +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
zh_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年

The ERA values are not set on FreeBSD and Illumos, and I suppose on macOS and Solaris too. It seems that currently they are only set on Linux.