This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.
While implementing this change I found that the _NormalizedString class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:
I first clarified the usage of the current_NormalizedString class across the codebase (see details in commit).
I then added the double quote delimitation check logic I wanted to add to the parser
Now that all strings have the same form, I more formally constrained how _NormalizedString behaves
Along the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required:
Avoid re-compiling a regular expression
Remove a duplicate test assertion
Perform a better assertion in a particular test, allegedly what was intended in the first place
This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.
I found this issue while processing a pofile used in the Spanish translation of the CPython documentation. One of our files was incorrectly written, and from all our tooling only the
msgcat
tool of GNU'sgettext
package complained, whilebabel
,polib
and others didn't. See https://github.com/python/python-docs-es/pull/2873, https://github.com/izimobil/polib/pull/161 and https://git.afpy.org/AFPy/powrap/pulls/4 for further reference.While implementing this change I found that the
_NormalizedString
class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:_NormalizedString
class across the codebase (see details in commit)._NormalizedString
behavesAlong the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required: