python / cpython

The Python programming language
https://www.python.org
Other
63.55k stars 30.45k forks source link

Add translation tests for argparse #124295

Open savannahostrowski opened 2 months ago

savannahostrowski commented 2 months ago

While #12711 was merged, it seems like there's a separate issue worth tracking around adding translation tests for argparse (see https://github.com/python/cpython/pull/12711#issuecomment-1964754284)

Linked PRs

tomasr8 commented 1 month ago

Continuing the discussion here since the other issue was closed.

Quoting Serhiy:

I thought that we should run the parser with different options to produce different error messages with installed special translator that transforms the original text in simply recognizable way. But there may be better ways.

I was originally thinking the same, but to me that seems like a lot of effort compared to the benefit we'd get. Instead, I think that using a message extractor (e.g. pygettext) we can assert both the presence of a specific message and the line number it appears at. This should give us pretty much the same confidence as directly checking the argparse output but with an order of magnitude less work. What do you think?

serhiy-storchaka commented 1 month ago

and the line number it appears at

Would not this make the tests extremely fragile?

tomasr8 commented 1 month ago

Would not this make the tests extremely fragile?

Does it matter that much if you don't need to fix the tests manually? You'd have a snapshot of the PO file and a simple command to regenerate it so all you'd need to do is double-check the diff

serhiy-storchaka commented 1 month ago

You would need to regenerate the snapshot after every change in argparse (and it was changed a lot in last few days), and the diff would be so large that it would be hard to notice a real regression in the noise.

But this is a good first step. Before #27667 was merged, Tools/i18n/pygettext.py produced a warning for _() used with non-literals. They should be treated as errors in the test. Also, line numbers should be ignored in comparison and only used to report errors. We should just collect a set of translated strings, compare it with the expected result and report about missing or additional strings.

tomasr8 commented 1 month ago

True, if the module is changing a lot, so would the snapshots. I'll try to come up with a PR based on your suggestions :)

serhiy-storchaka commented 3 weeks ago

We should also add translation tests for getopt and optparse, and maybe for other modules (this is separate issues). I am not a fan of converting many tests into packages just to add a data file. This may break history, add conflicts with existing PRs and make future work with tests less convenient. Maybe add a common directory for translation data?

tomasr8 commented 3 weeks ago

We should also add translation tests for getopt and optparse, and maybe for other modules (this is separate issues)

Agreed, I can take care of that once we're done with argparse.

I am not a fan of converting many tests into packages just to add a data file. This may break history, add conflicts with existing PRs and make future work with tests less convenient. Maybe add a common directory for translation data?

A common folder for all translation data sounds like a good idea!

tomasr8 commented 3 weeks ago

@serhiy-storchaka The PR also separates the translation tests into its own files which necessitates turning argparse tests into a package. Would you prefer that I merge the translation tests with test_argparse to avoid that?

serhiy-storchaka commented 3 weeks ago

Yes, of course.

serhiy-storchaka commented 3 weeks ago

Thank you for your contribution @tomasr8. If you wish, you can create new issues and PRs for getopt and optparse, or just PRs.

tomasr8 commented 3 weeks ago

I'll look into that :)

serhiy-storchaka commented 3 weeks ago

It does not catch ngettext strings.

tomasr8 commented 3 weeks ago

IIRC pygettext currently does not know how to handle *gettext() functions which take more than one argument. There is a way to specify additional keywords via --keyword=XXX but I don't see any way to tell it how many arguments it takes.

For example invoking

./Tools/i18n/pygettext.py -o messages.pot --keyword=ngettext Lib/argparse.py

Produces several warnings (and does not extract any ngettext strings):

*** Lib/argparse.py:1639: Seen unexpected token ","
*** Lib/argparse.py:2326: Seen unexpected token ","

We could fix pygettext to allow functions like ngettext. We could probably use the same syntax that GNU's xgettext and pybabel use to specify keywords and their arguments: https://www.gnu.org/software/gettext/manual/html_node/xgettext-Invocation.html#index-_002dk_002c-xgettext-option

serhiy-storchaka commented 3 weeks ago

For beginning, it should support all gettext functions by default. Later we can make this customizable.