onetrueawk / awk

One true awk
Other
2.01k stars 160 forks source link

awk fails to replace "/ere/" with "$0 ~ /ere/" according to POSIX #122

Closed bsdimp closed 3 years ago

bsdimp commented 3 years ago

FreeBSD has a bug filed against it: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235887

Tim Chase wrote in that bug:

I've hit a case in which /ere/ doesn't expand the same as "$0 ~ /ere/" which it should do according to the POSIX spec[0].

The goal was to meet the criterion "one and only one of multiple regex matches", so I used

jot 20 | awk '/1/ + /5/ == 1'

(this can be expanded for any number of expressions, e.g. "/1/ + /5/ + /7/ == 1", but the example using jot 20 makes it easier to demonstrate the problem, looking for lines containing 1 or 5 but not 15)

This gives a parse error:

$ jot 20 | awk '/1/ + /5/ == 1' awk: syntax error at source line 1 context is /1/ + >>> / <<< awk: bailing out at source line 1

Strangely, wrapping the expressions in parens works as expected:

$ jot 20 | awk '(/1/) + (/5/) == 1'

However manually performing the replacement documented above according to the POSIX spec:

$ jot 20 | awk '$0 ~ /1/ + $0 ~ /5/ == 1'

parses fine (instead of giving the syntax error), so awk isn't doing the "/ere/ -> $0 ~ /ere/" replacement POSIXly. However, this also doesn't give results I'd consider correct (it returns "5" and "15"). Again, wrapping those expansions in parens gives the expected/correct results:

$ jot 20 | awk '($0 ~ /1/) + ($0 ~ /5/) == 1'

As a side note, gawk parses the original notation ('/1/ + /5/ == 1') fine and it does the same as the parenthesized versions above.

-tkc

[0] """

When an ERE token appears as an expression in any context other than as the right-hand of the '˜' or "!˜" operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:

$0 ˜ /ere/

""" http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html

bsdimp commented 3 years ago

gawk gets this correct, btw.

arnoldrobbins commented 3 years ago

You reported this bug in https://github.com/onetrueawk/awk/issues/41 some time ago. The response then was:

Thanks for the report. Unfortunately, the onetrueawk grammar is very fragile, and we don't believe it could be fixed easily or without breaking anything else along the way. So, consider this as something that isn't likely to be fixed. Sorry.

Unless @plan9 wishes to dive into the grammar, the response is likely to remain the same. Closing this issue; Oz - reopen it please if you want to deal with it. Thanks.

bsdimp commented 3 years ago

Thanks Arnold. I'd forgotten I'd reported this before.