onetrueawk / awk

One true awk
Other
1.95k stars 157 forks source link

Cannot build IANA tz database 2022b #149

Open deborahgoldsmith opened 1 year ago

deborahgoldsmith commented 1 year ago

The latest commit (as of this report) of onetrueawk cannot built release 2022b of the IANA tz database.

Steps to reproduce

  1. Use latest commit from this repository; build awk and install in $PATH
  2. Check out tag 2022b from https://github.com/eggert/tz
  3. make rearguard_tarballs

Result:

awk: syntax error at source line 110 source file ziguard.awk
 context is
          stdoff_column = 2 * >>>  / <<< ^Zone/ + 1
awk: illegal statement at source line 110 source file ziguard.awk
awk: illegal statement at source line 110 source file ziguard.awk
make: *** [main.zi] Error 2

Regression: Works in FreeBSD 13.0 (earlier commit of onetrueawk, version 20190529) Works with gawk among others

johnhawkinson commented 1 year ago

A more compact one-liner test case:

jhawk@lrr ~ % echo foo | /usr/bin/awk '{stdoff_column = 2 * /^Zone/ + 1}'  
/usr/bin/awk: syntax error at source line 1
 context is
    {stdoff_column = 2 * /^Zone/ >>>  + <<<  1}
/usr/bin/awk: illegal statement at source line 1

Versus gawk:

jhawk@lrr ~ % echo foo | gawk '{stdoff_column = 2 * /^Zone/ + 1}'          
jhawk@lrr ~ % 
guyharris commented 1 year ago

The current Single UNIX Specification page for awk says

When an ERE token appears as an expression in any context other than as the right-hand of the '˜' or "!˜" operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:

$0 ˜ /ere/

I presume that /^Zone/ in 2 * /^Zone/ + 1 is an "ERE token".

That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.

johnhawkinson commented 1 year ago

That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.

See Lexical Conventions in the spec:

  1. The token ERE represents an extended regular expression constant. An ERE constant shall begin with the <slash> character. Within an ERE constant, a <backslash> character shall be considered to begin an escape sequence as specified in the table in XBD File Format Notation. In addition, the escape sequences in Escape Sequences in awk shall be recognized. The application shall ensure that a <newline> does not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the <slash> character after the one that begins the ERE constant. The extended regular expression represented by the ERE constant shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting <slash> characters.
millert commented 1 year ago

Placing the regex in parens makes the yacc grammar happy and appears to produce the correct result.

millert commented 1 year ago

The obvious fix is to simply add add:

        | re

to the end of the term rule in awkgram.y but that does increase the shift/reduce and reduce/reduce conflicts. The tests still pass though ;-)

plan9 commented 1 year ago

hi deborah, thanks for the report. I have a freebsd13 at hand, 20190529 release gives the same error.I have tested earlier versions as well. so it is not a change in our release of awk that now fails the IANA tz database build.

eggert commented 1 year ago

20190529 release gives the same error.

Yes, unfortunately I tested 20190529 incorrectly and so I mistakenly told Deborah that 20190529 did not have the bug. Sorry about that. I.e., this is not a regression (though it is still a bug).

For what it's worth, Solaris 10 /usr/bin/nawk (which has a version string saying "Oct 11, 1989") has the same bug. And similarly for Solaris 10 /usr/bin/awk (aka "oawk"), which has no version string but is even older. Evidently the bug has been around for a while.

plan9 commented 1 year ago

Evidently the bug has been around for a while.

yep, I've tested all those, including sol 8/10 nawk, awk, as well as MKS awk which solaris shipped as sys5 awk.

plan9 commented 1 year ago

@millert obvious fix gives us 225 reduce/reduce. we can remove re from | re | term combinations, that reduces the reduce/reduce conflics somewhat, better but not great. to be continued.

arnoldrobbins commented 1 year ago

@plan9 The grammar is definitely an area where "Here there be dragons." Tread very, very, carefully.