onetrueawk / awk

One true awk
Other
1.98k stars 159 forks source link

Fix regular expression clobbering in the lexer #170

Closed mpinjr closed 12 months ago

mpinjr commented 1 year ago

It is a mistake for the lexer not to make a copy of regexp literals, because the parser may need to shift more than one of them onto its stack as it searches for a matching rule.

For the following demonstration, the correct output is "1b":

    $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
    a1

The debugging output confirms that /b/ clobbered /a/, incorrectly modifying sub's first argument:

    $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
    reparse <b>
    reparse <b>

One of those should have been "reparse \".

I introduced this bug in a fix [1] for memory leaks discovered by Todd C. Miller [2]. This commit reverts my fix and applies a slightly modified version of Todd's recommendation (I omit the INDEX case because reg_expr is stored in a node).

[1] 577a67cc1f129e3ab240fcc47bf6f31cf6996b8e [2] https://github.com/onetrueawk/awk/pull/156

Testing with valgrind --leak-check=full $awk "$prog" </dev/null, the leaks produce something like this:

==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128 ==473055== at 0x484586F: malloc (vg_replace_malloc.c:381) ==473055== by 0x49E54FD: strdup (strdup.c:42) ==473055== by 0x409288: tostring (tran.c:522) ==473055== by 0x411DF5: regexpr (lex.c:557) ==473055== by 0x402E99: yyparse (awkgram.tab.c:2251) ==473055== by 0x402877: main (main.c:211)

Where $prog is any one of these one-liners:

'/abc/; /cde/' '"abc" ~ /cde/; /fgh/' 'match(/abc/, /cde/)' 'split(/abc/, a, /cde/)' 'sub(/abc/, /cde/)' 'gsub(/abc/, /cde/)' 'sub(/abc/, /cde/, a)' 'gsub(/abc/, /cde/, a)'

mpinjr commented 1 year ago

There are at least two more of these leaks to plug in the bsd branch:

$ git diff master..bsd-features awkgram.y | grep -A1 reg_expr
+   | GENSUB '(' reg_expr comma pattern comma pattern ')'
+       { $$ = op5(GENSUB, NIL, (Node*)makedfa($3, 1), $5, $7, rectonode()); }
--
+   | GENSUB '(' reg_expr comma pattern comma pattern comma pattern ')'
+       { $$ = op5(GENSUB, NIL, (Node*)makedfa($3, 1), $5, $7, $9); }

Hope everyone is well, Miguel

plan9 commented 1 year ago

thanks miguel, appreciated.