python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.32k stars 440 forks source link

Fuzzy matching sometimes misses obvious candidates #969

Closed jeanas closed 1 year ago

jeanas commented 1 year ago

Overview Description

I encountered this while updating catalogs for a documentation project in Sphinx with sphinx-intl, which uses Babel. A long msgid with a very small modification had its translation reset (the old one moved to obsolete), instead of being marked fuzzy.

Steps to Reproduce

Here is a reproducible example:

from io import BytesIO

from babel.messages.catalog import Catalog
from babel.messages.pofile import write_po

msgid = 'As will be later explained in [](parents-bounds), grobs come in several flavors, most importantly and spanners. The grob type normally mandates use as item or spanner. However, it happens that a grob can be used both ways. This is mainly the case for so-called "sticky grobs", which attach to another arbitrary grob, such as footnotes, balloons and parentheses. While many grobs attach to other grobs (e.g., articulations attach to note heads), sticky grobs are special because the grob they attach to, called their "host", is arbitrary and can therefore be either an item or a spanner. In turn, this necessitates creating the sticky grob either as item or spanner depending on the flavor of its host. The following function supports this common case:'

msgstr = "foobar"

catalog = Catalog()
catalog.add(msgid, msgstr)

new_msgid = "As explained in [](grob-flavors)" + msgid.removeprefix("As will be later explained in [](parents-bounds)")

template = Catalog()
template.add(new_msgid)

catalog.update(template)

print(list(catalog))

Actual Results

[<Message '' (flags: ['fuzzy'])>, <Message 'As explained in [](grob-flavors), grobs come in several flavors, most importantly and spanners. The grob type normally mandates use as item or spanner. However, it happens that a grob can be used both ways. This is mainly the case for so-called "sticky grobs", which attach to another arbitrary grob, such as footnotes, balloons and parentheses. While many grobs attach to other grobs (e.g., articulations attach to note heads), sticky grobs are special because the grob they attach to, called their "host", is arbitrary and can therefore be either an item or a spanner. In turn, this necessitates creating the sticky grob either as item or spanner depending on the flavor of its host. The following function supports this common case:' (flags: [])>]

Expected Results

The merged catalog has a fuzzy entry rather than an untranslated entry.

Additional Information

This appears to be caused by https://github.com/python/cpython/issues/90825.