onetrueawk / awk

One true awk
Other
1.98k stars 159 forks source link

Regression in \xNN esc. seqs. in strings #169

Closed rajeevvp closed 9 months ago

rajeevvp commented 1 year ago

The fix for issue #164 introduces regressions with \xNN (hex) esc. sequences in string literals. Fix for fix attached.

diff -urN awk-master.orig/lex.c awk-master/lex.c
--- awk-master.orig/lex.c   2022-12-15 18:34:49.000000000 +0000
+++ awk-master/lex.c    2023-01-26 22:11:06.092291000 +0000
@@ -419,8 +419,12 @@
                {
                int i;

+               if (!isxdigit(peek())) {
+                   unput(c);
+                   break;
+               }
                n = 0;
-               for (i = 1; i <= 2; i++) {
+               for (i = 0; i < 2; i++) {
                    c = input();
                    if (c == 0)
                        break;
@@ -431,13 +435,13 @@
                            n += (c - '0');
                        else
                            n += 10 + (c - 'a');
-                   } else
+                   } else {
+                       unput(c);
                        break;
+                   }
                }
-               if (n)
+               if (i)
                    *bp++ = n;
-               else
-                   unput(c);
                break;
                }

diff -urN awk-master.orig/testdir/T.misc awk-master/testdir/T.misc
--- awk-master.orig/testdir/T.misc  2022-12-15 18:34:49.000000000 +0000
+++ awk-master/testdir/T.misc   2023-01-26 22:11:06.092946000 +0000
@@ -504,3 +504,17 @@
 echo 'E 2' >foo1
 (trap '' PIPE; "$awk" 'BEGIN { print "hi"; }' 2>/dev/null; echo "E $?" >foo2) | :
 cmp -s foo1 foo2 || echo 'BAD: T.misc exit status on I/O error'
+
+# Check handling of octal (\OOO) and hex (\xHH) esc. seqs. in strings.
+echo 'hello888
+hello
+hello
+helloxGOO
+hello
+0A' > foo1
+$awk 'BEGIN { print "hello\888" }'   > foo2
+$awk 'BEGIN { print "hello\x000A" }' >> foo2
+$awk 'BEGIN { printf "hello\x0A" }'  >> foo2
+$awk 'BEGIN { print "hello\xGOO" }'  >> foo2
+$awk 'BEGIN { print "hello\x0A0A" }' >> foo2
+cmp -s foo1 foo2 || echo 'BAD: T.misc escape sequences in strings mishandled'
ozyigit commented 1 year ago

thank you for spotting the issue.

arnoldrobbins commented 9 months ago

Thanks for reporting the issue and supplying the fix. I will be pushing it to the repo shortly. @plan9 notes that gawk produces different output, but that's because gawk allows embedded NUL bytes. Closing the issue.