nexB / source-inspector

Tools to inspect source code and code symbols
0 stars 1 forks source link

xgettext: multiple starting lines for a string are not well supported #13

Open armijnhemel opened 3 months ago

armijnhemel commented 3 months ago

In the current xgettext implementation I can see at line 115 https://github.com/nexB/source-inspector/blob/9511f56b44ac7c5644b34d413146d58dd9fa7ea0/src/source_inpector/strings_xgettext.py#L115 the following:

_, _, start_line = line.rpartition(":")

This is likely leading to the wrong results, as a line can have multiple instances of start_line, which you aren't catching. As an example, I used xgettext with the same parameters as you did on libbb/lineedit.c from BusyBox:

$ xgettext --omit-header --extract-all --no-wrap lineedit.c

Some of the result lines:

#: lineedit.c:834 lineedit.c:890 lineedit.c:893
msgid "."
msgstr ""

As you can see there are multiple file/line number entries there. It seems that at some point the authors of xgettext decided to combine these. Your code does not correctly process these lines:

>>> line = '#: lineedit.c:834 lineedit.c:890 lineedit.c:893'
>>> _, _, start_line = line.rpartition(":")
>>> start_line
'893'
armijnhemel commented 3 months ago

When fixing this, please think of : possibly appearing in a file name as well. An easy test case: I moved lineedit.c to lineedit:834.c and then reran xgettext:

#: lineedit:834.c:834 lineedit:834.c:890 lineedit:834.c:893
msgid "."
msgstr ""

so just splitting on : might not be the right approach.

armijnhemel commented 3 months ago

Another option would be to use the --strict option, but that would require a (slight) rewrite of the code, plus it is discouraged:

Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn’t support the GNU extensions.
armijnhemel commented 3 months ago

Note: if the goal is to provide each string found in a source code file and report it, but you don't need to necessarily report duplicates, then the current code is of course complete fine.