sphinx-doc / sphinx-doc-translations

Translations for Sphinx's documentation
Other
22 stars 16 forks source link

Translation errors #36

Open AA-Turner opened 5 months ago

AA-Turner commented 5 months ago

I've had to blacklist Tamil, and zh_CN is newly failing, meaning the translations PRs don't get updated.

https://github.com/sphinx-doc/sphinx/actions/workflows/transifex.yml

Is there a way of ensuring in transifex that these errors don't happen?

A

rffontenelle commented 5 months ago

These errors are for UI, not docs. Considering reporting in there.

Important subject, though.

rffontenelle commented 5 months ago

@AA-Turner I already mentioned the Tamil issue in #28

rffontenelle commented 4 months ago

Is there a way of ensuring in transifex that these errors don't happen?

@AA-Turner Sorry for not answering this question specific before.

TL;DR; I don't think there is straightforward way to avoid it in Transifex.

Translations not honoring placeholders %s are reported in-screen as error for the translations (see Tamil example), but I haven't found a straightforward way to tx pull filtering these error strings. Haven't found a API endpoint either.

Pulling only reviewed translations would reduce the chance of these errors, but not ensure (plus adding a big burden to the existing contributors). I don't think it is worth.

A manual solution would be to have me editing and fixing, or clearing the problematic translation strings. I can do that if you need, but I need to be made aware (via CI etc.) whenever it happens.

Is it possible to programmatically retrieve the language codes causing the compilation to fail? It occurred to me that the transifex.yml CI workflow could keep going by first clearing these problematic language codes with git checkout <lang>.

n-peugnet commented 3 months ago

A manual solution would be to have me editing and fixing, or clearing the problematic translation strings. I can do that if you need, but I need to be made aware (via CI etc.) whenever it happens.

I just thought about it, but maybe the simplest solution would be to mark all failing messages as fuzzy. This will skip these messages, allowing the rest to compile without errors.

Is it possible to programmatically retrieve the language codes causing the compilation to fail?

It is possible with msgfmt --check (from gettext). I made a pull request on Sphinx to do exactly this for the internal messages.

Maybe another script could use this information to add the fuzzy tag automatically.

I made this very quick script to add the fuzzy flag to all failing strings using babel. It should probably be tuned a little bit to limit the diffs produced:

from sys import argv
from babel.messages.pofile import read_po, write_po

file = open(argv[1], "r+b")
catalog = read_po(file)
for message in catalog:
    errs = message.check()
    if errs:
        message.flags.add('fuzzy')

file.seek(0)
file.truncate()
write_po(file, catalog)
file.close()

But it seems msgfmt still finds errors that babel don't:

msgfmt --check -o /dev/null ta/LC_MESSAGES/sphinx.po
ta/LC_MESSAGES/sphinx.po:504: 'msgid' and 'msgstr' entries do not both end with '\n'
ta/LC_MESSAGES/sphinx.po:805: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:840: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:906: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:924: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:941: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:962: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:975: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:982: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1022: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1033: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1038: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1048: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1228: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1235: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1409: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1915: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:2745: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:2882: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:2940: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:3463: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:3489: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3494: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3499: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3506: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3694: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The string ends in the middle of a directive.
ta/LC_MESSAGES/sphinx.po:3718: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3732: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3744: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The string ends in the middle of a directive.
ta/LC_MESSAGES/sphinx.po:3751: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3801: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: In the directive number 1, the character 'S' is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3806: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3811: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: In the directive number 1, the character 'S' is not a valid conversion specifier.
msgfmt: found 36 fatal errors

So maybe using instead the stderr of msgfmt --check with babel's message.lineno (with the largest lineno inferior to the line of the error message) to add the fuzzy flag would be the best option.

n-peugnet commented 3 months ago

A probably simpler possibility would be to make a babel_runner.py check command that would only return the errors found by Babel. This way there is no dependency on gettext and the python script I showed earlier could be used to implement some kind of babel_runner.py check --fix command. The only inconvenient is that we can now miss the errors discovered by msgfmt --check, but nothing prevent us from adding them back in the python script later.

rffontenelle commented 3 months ago

Just to mention that Tamil team fixed the errors reported by Transifex, although I haven't checked the quality of the rest of the docs. Last time I checked, Changelog translation didn't have the issue number and link, so there's still room for improvement.