Open awilfox opened 4 years ago
Are you using the latest commit on branch master? It seems that i can not reproduce the issue... And that output buffer is dynamically allocated indeed: https://github.com/sabotage-linux/gettext-tiny/blob/ece94b7de3f051a4a417599f1cbf1e5cb6bfbb74/src/poparser.c#L376-L382
Only the input buffer: https://github.com/sabotage-linux/gettext-tiny/blob/ece94b7de3f051a4a417599f1cbf1e5cb6bfbb74/src/msgfmt.c#L151
If it's caused by an older version, maybe we should consider a new release. If change the size of this input buffer helps, it's another thing.
Update: I noticed one thing, when i print in
variable in poparser_feed_line, i do not get a string that long, in_len is pretty small(all below 100):
9, msgid ""
34, " -bar: add the help bar\n"
77, " -refresh: refresh list of options, then whole screen (command: /window "
13, "refresh)\n"
63, " -up: move the selected line up by \"number\" lines\n"
65, " -down: move the selected line down by \"number\" lines\n"
76, " -left: scroll the fset buffer by \"percent\" of width on the left\n"
77, " -right: scroll the fset buffer by \"percent\" of width on the right\n"
76, " -go: select a line by number, first line number is 0 (\"end\" to "
Update2: OK, i got it.... a pretty annoying problem..
char line[8192]
maybe we should just allocate that dynamically to 8KB on start and double its size whenever necessary (with realloc, so contents stay available). note there's also convbuf, which must be 4x as big as line. dalias suggested to maybe use getline
function, but i didn't investigate the pros and cons of that approach yet.
getline
is convenient. And it's exactly what came to my mind first.
But later i noticed some other "bugs". Specifically, three cases(GNU handle them smoothly):
1.
msgid
"sss"
2.
msgid "xxx" "sss"
3.
msgid"ss"
That is, po files are not to be parsed line by line. Though, we have not met such po files. The third one is most terrible one, because poparser assumes only one space after an entry token, not zero, not two or more, not tab, exactly one space.
So i want it to parse po files token by token. As a result, we should be able to feed the parser arbitrarily. And that will also solves this issue. Since it does not matter how you feed the parser anymore.
While the structure of poparser_feed_line
needs to be rearranged largely, it needs little new codes(below 50 lines). I am working on it :).
@awilfox Can you try the commit just before the msgmerge commit on branch newpoparser
? (that is, try HEAD~)
Reason why msgfmt failed is that msgmerge printed some super long lines. The latest commit fixed msgmerge, previous commits made some improves on msgfmt itself. So either the latest commit or commits before it should fix this specific issue.
Though i could generate mo files for all .po in weechat2.7. But that does not mean it's correct and it will pass all other packages on all platforms. It will be nice if you could test it with other common packages.
This will not be merged into master so soon, since it's changed a lot...
@rofl0r And maybe we should release a new tarball after the merge.
Reason why msgfmt failed is that msgmerge printed some super long lines
oh, good find! was it because of
But later i noticed some other "bugs". Specifically, three cases(GNU handle them smoothly):
oh, good find! was it because of
Nope... But msgfmt should have the ability to process super long lines too, IMO.
And parsing po files token by token solves those "bugs". But also you can feed arbitrary strings to it due to its working mechanism. That makes msgfmt more robust than it ever been.
I think it's ok as it does not add much code. Most of the time, i was moving codes around.
The logic is:
i can see the benefit of making parser more robust, otoh a big rewrite like this probably introduces some new bugs...
what do you think about fixing only the msgmerge bug which mistakenly emitted long lines in master
for the time being, so we can test the newpoparser branch extensively ?
Great. That's also my concern. It needs sometime to fix bugs.
Do i need to force push so that msgmerge fix is the first new commit, then pr.
Or we just patch and push to the master?
Or we just patch and push to the master?
i guess we can just fix master directly
Here it is, and i found i missed the plural_str in newpoparser branch actually. Shall we open a new issue for the improvement? @rofl0r
EDIT: now latest commit of master should fix the problem. @awilfox
cool, thanks! so the new stragegy is to simply use a separate string for chunks of 1024 in input string (as in msgstr "foo" "bar" "baz" ) ?
(i expected there was another bug that lead to the string becoming too long)
Shall we open a new issue for the improvement?
as you see fit, thanks
Backtrace:
This seems to be caused by the line being longer than the statically-allocated buffer:
Is it possible to use dynamic allocation for reading lines?