Closed kajte closed 7 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
5dff607
) 91.34% compared to head (abe6bda
) 91.43%. Report is 1 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Should it also convert the digits like this
numbers.format_decimal(1.2, locale="ar", numbering_system='default') == '٢٫٣'
?
I mean, if that's a sensible formatting, I think so, yeah. I do wonder if we should do one version where the high-level formatting functions still use "latn"
and emit e.g. a FutureWarning when the default for the locale (e.g. "ar"
) would not be Latin, like
FutureWarning: the default numbering system for the `"ar"` locale is ...; to keep Latin formatting, pass `numbering_system='latn'`
? I'm just kind of worried about putting people through the trouble of their formatting suddenly changing with a "routine" Babel upgrade.
Stats about numbering systems regarding symbols I did some statistics regarding availability of number symbols (I'll paste code used for calculations to the end of message):
latn
Conclusion: There's no need to have traditional alias. Also I feel native
alias isn't that useful and we could skip it if not having it makes code simpler but it's your call.
Next steps on this PR
Could we preliminarily agree on sensible signature for functions so I can't start implementing it so there's less risk I make a lot of code we don't agree on?
One option could be to add something like numbering_system_config: Literal["default"] | str = "latn"
to all relevant functions defaulting to latn. On later versions the FutureWarning
can be added as you suggested.
It would be nicer if it don't have to call _determine_numbering_system
on every stage and for lower level functions the argument could be only numbering_system: str = "latn"
. However, as far as I understand the library structure, most of functions in numbers
module (and all on the example below) are public and documented so someone might be using them directly and then the ability to pass "default" is probably useful. Correct me if I'm wrong on this.
What do you think? Do you have more suitable suggestions?
# new function
def _determine_numbering_system(locale: Locale, numbering_system: Literal["default"] | str = "latn") -> str:
if numbering_system == "default":
return locale.default_numbering_system
else:
return numbering_system
#existing but numbering_system_config argument added
def format_decimal(
number: float | decimal.Decimal | str,
format: str | NumberPattern | None = None,
locale: Locale | str | None = LC_NUMERIC,
decimal_quantization: bool = True,
group_separator: bool = True,
*,
numbering_system_config: Literal["default"] | str = "latn"
) -> str:
locale = Locale.parse(locale)
numbering_system = _determine_numbering_system(locale, numbering_system_config)
...
NumberPattern.apply(..., numbering_system_config=numbering_system)
NumberPattern():
#existing but numbering_system_config argument added
def apply(
self,
value: float | decimal.Decimal | str,
locale: Locale | str | None,
currency: str | None = None,
currency_digits: bool = True,
decimal_quantization: bool = True,
force_frac: tuple[int, int] | None = None,
group_separator: bool = True,
*,
numbering_system_config: Literal["default"] | str = "latn"
):
numbering_system = _determine_numbering_system(locale, numbering_system_config)
...
symbol = get_decimal_symbol(locale, numbering_system_config=numbering_system)
....
#existing but numbering_system_config argument and modified to use number system for getting correct symbol
def get_decimal_symbol(
locale: Locale | str | None = LC_NUMERIC,
*,
numbering_system_config: Literal["default"] | str = "latn"
) -> str:
parsed_locale = Locale.parse(locale)
numbering_system = _determine_numbering_system(locale, numbering_system_config)
return parsed_locale.number_symbols[numbering_system].get('decimal', '.')
Other thoughts
I was chatting with my friend who is native speaker of bn
. For that language it seems eg. decimal delimiters are same for bo latn and beng numering systems so it didn't help for the question whether mixing local symbols and latn numbers looks weird. However, I learned from him that converting digits from latn to beng is just doing 1-to-1 mapping from 0123456789 to ০১২৩৪৫৬৭৮৯ (https://github.com/unicode-org/cldr/blob/main/common/supplemental/numberingSystems.xml#L18). Also, he said they use quite often also latn numbers.
I think it would be quite easy to validate digit formatting against eg. the java/kotlin number formatter from the standard library to get more confidence (or to findout there's more nuances) that converting digits is as simple as 1-to-1 mapping for all number systems having 10 digits.
Here's code that I used to get stats on the beginning of post. Basically calculating number of failures or successes depending on the case. Just for the reference.
def test_latn_available(locale):
assert Locale.parse(locale).number_symbols["latn"]
@pytest.mark.all_locales
def test_default_available(locale):
locale = Locale.parse(locale)
assert locale.number_symbols[locale.default_numbering_system]
@pytest.mark.all_locales
def test_locale_have_native(locale):
assert Locale.parse(locale).other_numbering_systems["native"]
@pytest.mark.all_locales
def test_native_locale_have_symbols(locale):
locale = Locale.parse(locale)
assert locale.number_symbols[locale.other_numbering_systems["native"]]
@pytest.mark.all_locales
def test_locale_have_traditional(locale):
assert Locale.parse(locale).other_numbering_systems["traditional"]
@pytest.mark.all_locales
def test_traditional_locale_have_symbols(locale):
locale = Locale.parse(locale)
assert locale.number_symbols[locale.other_numbering_systems["traditional"]]
@pytest.mark.all_locales
def test_native_and_default_same_if_native_isnt_latin(locale):
locale = Locale.parse(locale)
native = locale.other_numbering_systems["native"]
default = locale.default_numbering_system
if native != "latn":
assert native == default, f"{default} {native} {locale}"
Sorry for the delay, had a bunch of other things to attend to...
One option could be to add something like
numbering_system_config: Literal["default"] | str = "latn"
to all relevant functions defaulting to latn. On later versions theFutureWarning
can be added as you suggested.
Sounds good, but let's call the argument numbering_system
instead, since it's kwarg-only so numbering_system_config=
would be littering user code with an extra _config
word :)
It would be nicer if it don't have to call
_determine_numbering_system
on every stage and for lower level functions the argument could be onlynumbering_system: str = "latn"
. However, as far as I understand the library structure, most of functions innumbers
module (and all on the example below) are public and documented so someone might be using them directly and then the ability to pass "default" is probably useful.
That is the case, yep. That's somewhat analogous to how you can pass in a locale identifier to almost anything, and we call Locale.parse
on it.
The API surface looks good to me at first glance aside from that naming change :)
@kajte I'm planning on trying to get a new Babel out maybe this week or early next week, and I think we could at least include the numbering-system loading code from this PR, even if there'd be no surface changes that'd allow to use them just yet, if you have the time to crop this PR to only that? IOW, pretty much just change the get_*_symbol()
code to always use "latn"
for the time being.
@kajte I'm planning on trying to get a new Babel out maybe this week or early next week, and I think we could at least include the numbering-system loading code from this PR, even if there'd be no surface changes that'd allow to use them just yet, if you have the time to crop this PR to only that? IOW, pretty much just change the
get_*_symbol()
code to always use"latn"
for the time being.
I wrote some code last week that I didn't have time to complete yet. I can check today or tomorrow whether it's close to done. If not, your suggestions sounds good.
Okay, I got something done now.
numbering_system: Literal["default"] | str = "latn"
which is used to select from which numbering system symbols are used.
I hope I didn't miss any function or forgot to test any function where I added the numbering_system argumentHow does it look? If needed I can also change it what you suggested on Wednesday so that we only want to include that part to the next release.
@kajte Thanks! Can you rebase this on master
and take care of the lint errors? Beyond the ones currently flagged by Ruff, there will be new ones since #1045 was just merged. I'll give the rest a review too now.
@kajte Thanks! Can you rebase this on
master
and take care of the lint errors? Beyond the ones currently flagged by Ruff, there will be new ones since #1045 was just merged. I'll give the rest a review too now.
Rebased and at least pre-commit hook succeed locally now.
@kajte Thanks! 🎉
How should I run linters locally to catch these? I installed the pre-commit hook and it passed for my commit.
@kajte Thanks! 🎉
Thanks for your support! I have couple busy weeks ahead but I'll try continue with formatting actual numerals after that.
How should I run linters locally to catch these? I installed the pre-commit hook and it passed for my commit.
I think you hadn't rebased on the master
that had the augmented Ruff configuration :)
Thanks for your support! I have couple busy weeks ahead but I'll try continue with formatting actual numerals after that.
Thank you for the contribution! Same here TBH, and then it'll be kinkunpaistoaika and – – 😁
Changes:
Example: Before:
Now:
Fixes partially issue:https://github.com/python-babel/babel/issues/446 I took some inspiration from the closed PR: https://github.com/python-babel/babel/pull/470
Couple questions:
latn
to the default numbering system for locales which default system isn'tlatn
? Eg. for before the changefa_IR
usedlatn
but now it starts usingarabext
.I would eventually like to fix other
# TODO: Support other number systems
TODOs from the code and take default numbering system to use in percentage, currency etc. formatting and add a function for formatting digits to the local numbering system but I decided to create a pull request at this point to check the direction I'm taking is ok for maintainers. I feel even features added in this PR are an enhancement as such.