python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.34k stars 448 forks source link

`KeyError` raised in `format_skeleton` when using fuzzy matching #1084

Open tomasr8 opened 6 months ago

tomasr8 commented 6 months ago

Overview Description

When using format_skeleton, a KeyError is raised despite using fuzzy=True.

Steps to Reproduce

from datetime import datetime

from babel.dates import format_skeleton

dt = datetime(2012, 1, 1, 14, 30, 59)
format_skeleton("G", dt, locale="cs_CZ", fuzzy=True)

From the docs:

fuzzy – If the skeleton is not found, allow choosing a skeleton that’s close enough to it.

The way I read it is that as long as I pass fuzzy=True (which is the default) a skeleton should always be found and I should not need to worry about KeyErrors in that case.

Even the example in the docs makes it seem a KeyError is only thrown with fuzzy=False:

format_skeleton('yMMd', t, fuzzy=False, locale='fi')  # yMMd is not in the Finnish locale, an error is thrown

If this is the expected behaviour, I think the docs should state that explicitly. If not, I would suggest adding allow_different_fields=True to the underlying match_skeleton() call.

Additional Information

Babel version: 2.15

tomasr8 commented 4 months ago

@akx What do you think? Should we fix this?

akx commented 4 months ago

Yeah, I think so. My original commit for fuzzy skeletons (fuzzy skeletons, yikes what a horrible mental image) https://github.com/python-babel/babel/commit/cd70395b0f138b7f60b0866f5b2b164f1a025786 refers to the ICU4J getBestSkeleton function as the reference implementation.

Apparently there's been at least one commit https://github.com/unicode-org/icu/commit/edaebfa64ee673ed580f464123d3afd4deffc693 that has improved that function since – we should maybe look at whether that improvement would in fact fix this?

tomasr8 commented 4 months ago

Looks like the only thing that changed is more format specifier replacements which we could add as well:

    if 'k' in skeleton and not any('k' in option for option in options):
        skeleton = skeleton.replace('k', 'H')
    if 'K' in skeleton and not any('K' in option for option in options):
        skeleton = skeleton.replace('K', 'h')
    if 'a' in skeleton and not any('a' in option for option in options):
        skeleton = skeleton.replace('a', '')
    if 'b' in skeleton and not any('b' in option for option in options):
        skeleton = skeleton.replace('b', '')

This doesn't fix the KeyError issue though. I'm a bit unsure whether we should fix it or just make it clear in the docs that the function can in fact raise even when fuzzy=True. If we do want to fix this, I think we can implement the recommendation from the tr35 doc for missing fields: https://www.unicode.org/reports/tr35/tr35-72/tr35-dates.html#missing-skeleton-fields

akx commented 4 months ago

For the time being, we could maybe just document that fuzzy=True may still raise, and fix that in a subsequent version using that recommendation?

tomasr8 commented 4 months ago

Yeah, that sounds like the best option