Assertion failure while building WeeChat 2.7

awilfox commented 4 years ago

Assertion failed: *in (src/StringEscape.c: unescape: 78)
make[2]: *** [po/CMakeFiles/translations.dir/build.make:81: po/de.gmo] Aborted
make[2]: *** Deleting file 'po/de.gmo'
make[2]: Leaving directory '/usr/src/packages/user/weechat/src/weechat-2.7/build'

Backtrace:

Core was generated by `/usr/bin/msgfmt -o /usr/src/packages/user/weechat/src/weechat-2.7/build/po/de.g'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fffbb06e498 in __syscall4 (d=8, c=0, b=70368447356360, a=2, n=174) at ./arch/powerpc64/syscall_arch.h:54
54      ./arch/powerpc64/syscall_arch.h: No such file or directory.
(gdb) bt
#0  0x00003fffbb06e498 in __syscall4 (d=8, c=0, b=70368447356360, a=2, n=174) at ./arch/powerpc64/syscall_arch.h:54
#1  __restore_sigs (set=0x3fffee4eddc8) at src/signal/block.c:43
#2  0x00003fffbb06e7e8 in raise (sig=<optimized out>) at src/signal/raise.c:11
#3  0x00003fffbb025c58 in abort () at src/exit/abort.c:13
#4  0x00003fffbb025d6c in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at src/exit/assert.c:8
#5  0x000000012aab1814 in unescape (in=<optimized out>, out=<optimized out>, outsize=<optimized out>) at src/StringEscape.c:126
#6  0x000000012aab0b54 in poparser_feed_line (p=0x3fffee4ee078, 
    in=0x3fffee4ee1a8 "msgid \"       -bar: add the help bar\\n   -refresh: refresh list of options, then whole screen (command: /window refresh)\\n        -up: move the selected line up by \\\"number\\\" lines\\n      -down: move "..., in_len=<optimized out>)
    at src/poparser.c:283
#7  0x000000012aaaf84c in process (in=0x3fffbb0e3520, out=0x3fffbb0e3020, strict=<optimized out>) at src/msgfmt.c:201
#8  0x000000012aaaeac8 in main (argc=<optimized out>, argv=<optimized out>) at src/msgfmt.c:422
(gdb) frame 6
#6  0x000000012aab0b54 in poparser_feed_line (p=0x3fffee4ee078, 
    in=0x3fffee4ee1a8 "msgid \"       -bar: add the help bar\\n   -refresh: refresh list of options, then whole screen (command: /window refresh)\\n        -up: move the selected line up by \\\"number\\\" lines\\n      -down: move "..., in_len=<optimized out>)
    at src/poparser.c:283
283     src/poparser.c: No such file or directory.

This seems to be caused by the line being longer than the statically-allocated buffer:

(gdb) printf "%s\n", in
msgid "       -bar: add the help bar\n   -refresh: refresh list of options, then whole screen (command: /window refresh)\n        -up: move the selected line up by \"number\" lines\n      -down: move the selected line down by \"number\" lines\n      -left: scroll the fset buffer by \"percent\" of width on the left\n     -right: scroll the fset buffer by \"percent\" of width on the right\n        -go: select a line by number, first line number is 0 (\"end\" to select the last line)\n    -toggle: toggle the boolean value\n       -add: add \"value\" (which can be a negative number) for integers and colors, set/append to value for other types (set for a negative value, append for a positive value)\n     -reset: reset the value of option\n     -unset: unset the option\n       -set: add the /set command in input to edit the value of option (move the cursor at the beginning of value)\n    -setnew: add the /set command in input to edit a new value for the option\n    -append: add the /set command to append something in the value of option (move the cursor at the end of value)\n      -mark: toggle mark\n    -format: switch to the next available format\n    -export: export the options and values displayed in a file (each line has format: \"/set name value\" or \"/unset name\")\n      -help: force writing of help on options in exported file (see /help fset.look.export_help_default)\n    -nohelp: do not write help on options in exported file (see /help fset.look.export_help_default)\n     filter: set a new filter to see only matching options (this filter can be used as input in fset buffer as well); allowed formats are:\n               *       show all options (no filter)\n               xxx     show only options with \"xxx\" in name\n               f:xxx   show only configuration file \"xxx\"\n               t:xxx   show only type \"xxx\" (bool/int/str/col)\n               d       show only changed options\n               d:xxx   show only changed options with \"xxx\" in name\n               d=xxx   show only changed options with \"xxx\" in value\n               d==xxx  show only changed options with exact value \"xxx\"\n               h=xxx   show only options with \"xxx\" in description (translated)\n               he=xxx  show only options with \"xxx\" in description (in English)\n               =xxx    show only options with \"xxx\" in value\n               ==xxx   show only options with exact value \"xxx\"\n               c:xxx   show only options matching the evaluated condition \"xxx\", using following variables: file, section, option, name, parent_name, type, type_en, type_short (bool/int/str/col), type_tiny (b/i/s/c), default_value, default_value_undef, value, quoted_value, value_undef, value_changed, parent_value, min, max, description, description2, description_en, description_en2, string_values\n\nThe lines with options are displayed using string evaluation (see /help eval for the format), with these options:\n  - fset.format.option1: first format for an option\n  - fset.format.option2: second format for an option\n\nThe following variables can be used in these options:\n  - option data, with color and padded by spaces on the right:\n    - ${file}: configuration file (for example \"weechat\" or \"irc\")\n    - ${section}: section\n    - ${option}: option name\n    - ${name}: full option name (file.section.option)\n    - ${parent_name}: parent option name\n    - ${type}: option type (translated)\n    - ${type_en}: option type (in English)\n    - ${type_short}: short option type (bool/int/str/col)\n    - ${type_tiny}: tiny option type (b/i/s/c)\n    - ${default_value}: option default value\n    - ${default_value_undef}: \"1\" if default value is null, otherwise \"0\"\n    - ${value}: option value\n    - ${value_undef}: \"1\" if value is null, otherwise \"0\"\n    - ${value_changed}: \"1\" if value is different from default value, otherwise \"0\"\n    - ${value2}: option value, with inherited value if null\n    - ${parent_value}: parent option value\n    - ${min}: min value\n    - ${max}: max value\n    - ${description}: option description (translated)\n    - ${description2}: option description (translated), \"(no description)\" (translated) if there's no description\n    - ${description_en}: option description (in English)\n    - ${description_en2}: option description (in English), \"(no description)\" if there's no description\n    - ${string_values}: string values allowed for set of an integer option using strings\n    - ${marked}: \"1\" if option is marked, otherwise \"0\"\n    - ${index}: index of option in list\n  - option data, with color but no spaces:\n    - same names prefixed by underscore, for example: ${_name}, ${_type}, ...\n  - option data, raw format (no colors/spaces):\n    - same names prefixed by two underscores, for example: ${__name}, ${__type}, ...\n  - option data, only spaces:\n    - same names prefixed with \"empty_\", for example: ${empty_name}, ${empty_type}\n  - other data:\n    - ${selected_line}: \"1\" if the line is selected, otherwise \"0\"\n    - ${newline}: insert a new line at point, so the option is displayed on multiple lines\n\nKeys and input to move in on fset buffer:\n  up                        move one line up\n  down                      move one line down\n  pgup                      move one page up\n  pgdn                      move one page down\n  alt-home          <<      move to first line\n  alt-end           >>      move to last line\n  F11               <       scroll horizontally on the left\n  F12               >       scroll horizontally on the right\n\nKeys and input to set options on fset buffer:\n  alt+space         t       toggle boolean value\n  alt+'-'           -       subtract 1 from value for integer/color, set value for other types\n  alt+'+'           +       add 1 to value for integer/color, append to value for other types\n  alt+f, alt+r      r       reset value\n  alt+f, alt+u      u       unset value\n  alt+enter         s       set value\n  alt+f, alt+n      n       set new value\n  alt+f, alt+a      a       append to value\n  alt+','           ,       mark/unmark option\n  shift+up                  move one line up and mark/unmark option\n  shift+down                mark/unmark option and move one line down\n                    m:xxx   mark options displayed that are matching filter \"xxx\" (any filter on option or value is allowed, see filters above)\n                    u:xxx   unmark options displayed that are matching filter \"xxx\" (any filter on option or value is allowed, see filters above)\n\nOther keys and input on fset buffer:\n  ctrl+L                    refresh options and whole screen (command: /fset -refresh)\n                    $       refresh options (keep marked options)\n                    $$      refresh options (unmark all options)\n  alt+p             p       toggle plugin description options (plugins.desc.*)\n  alt+v             v       toggle help bar\n                    s:x,y   sort options by fields x,y (see /help fset.look.sort)\n                    s:      reset sort to its default value (see /help fset.look.sort)\n                    w:xxx   export options in file \"xxx\"\n                    w-:xxx  export options in file \"xxx\" without help\n                    w+:xxx  export options in file \"xxx\" with help\n  ctrl+X            x       switch the format used to display options\n                    q       close fset buffer\n\nMouse actions on fset buffer:\n  wheel up/down                   move line up/down\n  left button                     move line here\n  right button                    toggle boolean (on/off) or edit the option value\n  right button + drag left/right  increase/decrease value for integer/color, set/append to value for other types\n  right button + drag up/down     mark/unmark multiple options\n\nNote: if input has one or more leading spaces, the following text is interpreted as a filter, without the spaces. For example \" q\" searches all options with \"q\" inside name while \"q\" closes the fset buffer.\n\nExamples:\n  show IRC options changed:\n    /fset d:irc.*\n  show all options with \"nicklist\
(gdb)

Is it possible to use dynamic allocation for reading lines?

xhebox commented 4 years ago

Are you using the latest commit on branch master? It seems that i can not reproduce the issue... And that output buffer is dynamically allocated indeed: https://github.com/sabotage-linux/gettext-tiny/blob/ece94b7de3f051a4a417599f1cbf1e5cb6bfbb74/src/poparser.c#L376-L382

Only the input buffer: https://github.com/sabotage-linux/gettext-tiny/blob/ece94b7de3f051a4a417599f1cbf1e5cb6bfbb74/src/msgfmt.c#L151

If it's caused by an older version, maybe we should consider a new release. If change the size of this input buffer helps, it's another thing.

Update: I noticed one thing, when i print in variable in poparser_feed_line, i do not get a string that long, in_len is pretty small(all below 100):

9, msgid ""
34, "       -bar: add the help bar\n"
77, "   -refresh: refresh list of options, then whole screen (command: /window "
13, "refresh)\n"
63, "        -up: move the selected line up by \"number\" lines\n"
65, "      -down: move the selected line down by \"number\" lines\n"
76, "      -left: scroll the fset buffer by \"percent\" of width on the left\n"
77, "     -right: scroll the fset buffer by \"percent\" of width on the right\n"
76, "        -go: select a line by number, first line number is 0 (\"end\" to "

Update2: OK, i got it.... a pretty annoying problem..

rofl0r commented 4 years ago

char line[8192]

maybe we should just allocate that dynamically to 8KB on start and double its size whenever necessary (with realloc, so contents stay available). note there's also convbuf, which must be 4x as big as line. dalias suggested to maybe use getline function, but i didn't investigate the pros and cons of that approach yet.

xhebox commented 4 years ago

getline is convenient. And it's exactly what came to my mind first.

But later i noticed some other "bugs". Specifically, three cases(GNU handle them smoothly):

1.

msgid
"sss"

2.

msgid "xxx" "sss"

3.

msgid"ss"

That is, po files are not to be parsed line by line. Though, we have not met such po files. The third one is most terrible one, because poparser assumes only one space after an entry token, not zero, not two or more, not tab, exactly one space.

So i want it to parse po files token by token. As a result, we should be able to feed the parser arbitrarily. And that will also solves this issue. Since it does not matter how you feed the parser anymore.

While the structure of poparser_feed_line needs to be rearranged largely, it needs little new codes(below 50 lines). I am working on it :).

xhebox commented 4 years ago

@awilfox Can you try the commit just before the msgmerge commit on branch newpoparser? (that is, try HEAD~)

Reason why msgfmt failed is that msgmerge printed some super long lines. The latest commit fixed msgmerge, previous commits made some improves on msgfmt itself. So either the latest commit or commits before it should fix this specific issue.

Though i could generate mo files for all .po in weechat2.7. But that does not mean it's correct and it will pass all other packages on all platforms. It will be nice if you could test it with other common packages.

This will not be merged into master so soon, since it's changed a lot...

@rofl0r And maybe we should release a new tarball after the merge.

rofl0r commented 4 years ago

Reason why msgfmt failed is that msgmerge printed some super long lines

oh, good find! was it because of

But later i noticed some other "bugs". Specifically, three cases(GNU handle them smoothly):

xhebox commented 4 years ago

oh, good find! was it because of

Nope... But msgfmt should have the ability to process super long lines too, IMO.

And parsing po files token by token solves those "bugs". But also you can feed arbitrary strings to it due to its working mechanism. That makes msgfmt more robust than it ever been.

I think it's ok as it does not add much code. Most of the time, i was moving codes around.

The logic is:

accumulates lines until there's at least one line.
consume buffer line by line
convert encodings if needed
parse token by token
drop consumed lines

rofl0r commented 4 years ago

i can see the benefit of making parser more robust, otoh a big rewrite like this probably introduces some new bugs... what do you think about fixing only the msgmerge bug which mistakenly emitted long lines in master for the time being, so we can test the newpoparser branch extensively ?

xhebox commented 4 years ago

Great. That's also my concern. It needs sometime to fix bugs.

Do i need to force push so that msgmerge fix is the first new commit, then pr.

Or we just patch and push to the master?

rofl0r commented 4 years ago

Or we just patch and push to the master?

i guess we can just fix master directly

xhebox commented 4 years ago

Here it is, and i found i missed the plural_str in newpoparser branch actually. Shall we open a new issue for the improvement? @rofl0r

EDIT: now latest commit of master should fix the problem. @awilfox

rofl0r commented 4 years ago

cool, thanks! so the new stragegy is to simply use a separate string for chunks of 1024 in input string (as in msgstr "foo" "bar" "baz" ) ?

(i expected there was another bug that lead to the string becoming too long)

Shall we open a new issue for the improvement?

as you see fit, thanks

sabotage-linux / gettext-tiny

Assertion failure while building WeeChat 2.7 #50