izimobil / polib

Pure python library to manipulate, create, modify gettext files (pot, po and mo files).
MIT License
97 stars 28 forks source link

POEntry.occurrences are split if file name contains spaces #154

Open Zsar opened 1 month ago

Zsar commented 1 month ago

polib 1.2.0 on Python 3.7 installed via pip.

I know, I know. Who in their right mind puts spaces in file names. Reporting it anyway, as gettext writes these without issue and poedit processes them (displays the correct line in the correct file) without issue.

Expected: One '#:' line is converted to one occurrence tuple. (An occurrence without line number should ideally raise an error on parsing, to prevent consecutive faults.)

Observed: E.g. #: file name .cs:10 is split into three occurrences: 'file', 'name', '.cs:10', of which two then have no line number. (And perhaps more importantly, of which none - hopefully - points towards any actual file.)

$ python3.7
Python 3.7.3 (default, Mar 23 2024, 16:12:05) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polib
>>> po = polib.pofile('en.po')
>>> entry = po[0]
>>> entry.occurrences
[('file', ''), ('name', ''), ('.cs', '10')]
>>> exit()
Zsar commented 1 month ago

Oh, it is even worse: As the spaces are eaten, there is no way to correctly reconstruct the file name afterwards - e.g. ` and ` both become '', so there is loss of information and e.g. in a folder with

there is no way to disambiguate.

... I think it would be only fair to assert on no spaces right when parsing the pofile, if support is not present, rather than silently emit invalid, lossy data.