Open mathieucaroff opened 1 year ago
Thank you for reporting this! After some initial investigation any additional character after an obsolete marker (#~
) seems to lead to this error. This is unrelated to the fuzzy flag.
So currently only basic obsolete messages are supported. Obsolete messages currently cannot contain previous messages (|
), references (:
),...
It would definitely be good to support this. A PR would be very much appreciated.
@KapitanOczywisty would you be able to have a look at this?
So currently only basic obsolete messages are supported. Obsolete messages currently cannot contain previous messages (
|
), references (:
),...
@smhg Yes, they can, by using #|
and #:
. I don't know where #~|
came from, but It's not included anywhere in gettext documentation. All comments are connected to their msgid
, so if that msgid
is obsolete there is no need to mark comments as obsolete. Reversing that: why not #~, fuzzy
? You never see #~,
, likely because It's popular enough that people seen it correctly used and implemented.
@KapitanOczywisty If I understand correctly #|
means a previous translation and #~|
is an obsolete previous translation?
I can't tell whether that is actually usefull to programs, but moving from non-obsolete to obsolete just adds the ~
so it makes sense that's the result?
And obsolete message parsing currently doesn't look at the 3rd character correctly I think?
If I understand correctly
#|
means a previous translation and#~|
is an obsolete previous translation?
I only found that: https://github.com/winlibs/gettext/blob/24b0fe071cff79acf4a3904591b481328732ddc1/source/gettext-tools/src/po-lex.c#L903-L906
And obsolete message parsing currently doesn't look at the 3rd character correctly I think?
I thinks parsing ignores whitespace and searches for msgid keyword. I can make this work wit #~|
though this creates some questions:
#~|
be treated as #|
?I believe the simplest, non-surprising behavior would be to treat as a regular comment, if that makes sense with regards to the way gettext-parser currently works.
Basically #~|
should be treated as if it were #~#|
, right?
It should be listed like this one I think. These are already parsed out.
But looking a the code, this does not look straighforward.
First this @KapitanOczywisty: do you remember why you grouped the obsolete
state with the none
state? Wouldn't it be better to separate those? Because different parts of the checks under those are applicable to different states, no?
do you remember why you grouped the
obsolete
state with thenone
state?
At the time I was not aware of #~|
so only correct use would be with keywords e.g. #~ msgid
. I can fix that, just we need to decide how #~|
should be handled. I'd prefer to treat it as #|
, which means parse and compilation would yield different results than input, but all other solutions have major problems with compatibility anyway.
In my suggestion, this file:
#, fuzzy
#~| msgid "Latest version"
#~ msgid "Latest Version"
#~ msgstr "Latest version"
after parse would become:
{
// ...
"obsolete": {
"": {
"Latest Version": {
"msgid": "Latest Version",
"comments": {
"flag": "fuzzy",
"previous": "msgid \"Latest version\""
},
"msgstr": [
"Latest version"
]
}
}
}
}
and then after compilation:
#, fuzzy
#| msgid "Latest version"
#~ msgid "Latest Version"
#~ msgstr "Latest version"
I have a .po file whose translations are handled with
Poedit
and I get obsolete entries in my.po
files. The parser fails when it encounters them.Reproduction
en.po
Parsing the content of
en.po
withgettext-parser
(runninggettextParser.po.parse(content)
) produces the following error:Expected behavior
The parser should not crash. It should either ignore the entry, or properly parse it to data.
Misc
I believe this issue was introduced by the following PR: https://github.com/smhg/gettext-parser/pull/64 -- released in version 5.0.0. 4.2.0 is the last version before 5.0.0.
Have a good day !