haskell / alex

A lexical analyser generator for Haskell
https://hackage.haskell.org/package/alex
BSD 3-Clause "New" or "Revised" License
297 stars 82 forks source link

alex 3.2.7 fails to parse definitions which can pass alex 3.2.6 #197

Closed Commelina closed 2 years ago

Commelina commented 2 years ago

I tried the latest alex 3.2.7 on a common-used library language-c-0.9.0.1. However, it failed to parse Lex.x:

src/Language/C/Parser/Lexer.x:123:22: parse error

It seems that the problem happened here:

@iec60559suffix = (32|64|128)[x]?

Then I rolled my alex back to version 3.2.6 then everything worked. I am not sure if it is a bug.

hasufell commented 2 years ago

Same here: https://github.com/haskell/ghcup-metadata/runs/4898784174?check_suite_focus=true#step:8:35

Ericson2314 commented 2 years ago

OK I deprecated 3.2.7 and made it unbuildable. We'll need some new test cases for this.

Ericson2314 commented 2 years ago

@andreasabel Assuming this isn't a crazy bootstrapping issue, it looks like you made all the parser changes between 3.2.6 and 3.2.7. Would have any time to help debug this?

andreasabel commented 2 years ago

@Ericson2314 : I started to have a look. Then I hit #195. (I have some old repo of alex which still has a copy of data/AlexTemplate-ghc, so this does not stop me now.)

andreasabel commented 2 years ago

Shrunk example: Issue197.x

{}
@iec60559suffix = (32|64|128)[x]?

tokens :-

$white+         ;
@iec60559suffix ;
{}

Phew, finally found the magic incantations to make bisection work:

$ cat > issue197.sh
cp data/AlexTemplate-ghc data/AlexTemplate.hs
cabal run alex -- Issue197.x
^D
$ chmod +x issue197.sh
$ git bisect start 3.2.7 3.2.6
$ git bisect run ./issue197.sh

Result: d6653f3ef7dffb462baa8b4b7165d1fa8f2b36a1 is the first bad commit Author: Andreas Abel Date: Fri Jan 31 19:09:42 2020 +0100

[ fixed #141 ] regex: allow arbitary repetitions

Previously, the r{n,m} and related forms were restricted to single digit numbers n and m.

We fix this by recognizing numbers of > 1 digits as NUM token and adding rules to the regex parsing that also allows NUMs for n and m. Previously, only CHAR was allowed, which subsumes single digits.

Ericson2314 commented 2 years ago

Running info on the happy grammar,

...
  58     rep -> '{' CHAR '}'                                (49)
  59     rep -> '{' CHAR ',' '}'                            (50)
  60     rep -> '{' CHAR ',' CHAR '}'                       (51)
  61     rep -> '{' NUM '}'                                 (52)
  62     rep -> '{' NUM ',' '}'                             (53)
  63     rep -> '{' NUM ',' NUM '}'                         (54)
...
  71     set0 -> CHAR                                       (62)
  72     set0 -> CHAR '-' CHAR                              (63)
...

The problem is these lower rules don't have NUM equivalents.

I think remembering to double up NUM and CHAR is hard, and all the more so when this case we would need to "unparse" the number, yet also preserve things like 001 vs 1.

This makes makes we think we should not have a NumT, and instead just have a number nonterminal, parsed from char terminals. Does that sound good?

andreasabel commented 2 years ago

Er, @Ericson2314, I am already working on this. I just lack the repo rights to assign this to me so that you would know about it.

andreasabel commented 2 years ago

This makes makes we think we should not have a NumT, and instead just have a number nonterminal, parsed from char terminals. Does that sound good?

Yes this sounds good. I just wish you would not ask me to help and then race for a fix. Wasted my time, I could have looked into something else.

Ericson2314 commented 2 years ago

Er, @Ericson2314, I am already working on this. I just lack the repo rights to assign this to me so that you would know about it.

Oh sorry! That's awkward. I should have watched the time zones more carefully, I just assumed you stopped at the bisect.

Yes, let's get you those perms.

andreasabel commented 2 years ago

I should have stated in text that I am looking into a fix...

PR #202 seems to solve the problem in a satisfactory way, by restricting the parsing of numerals to inside multiplicity expressions {n,m}.

Ericson2314 commented 2 years ago

@andreasabel BTW I earlier sent you an email taking the git email address. Is that a fine email address? I want to reach out to @simonmar more directly to get to permissions.

andreasabel commented 2 years ago

@andreasabel BTW I earlier sent you an email taking the git email address. Is that a fine email address? I want to reach out to @simonmar more directly to get to permissions.

I found the message. I am often slow with email, looking into my github notifications more often than my email.

Thanks for your efforts!

Ericson2314 commented 2 years ago

@Commelina and others, please try https://hackage.haskell.org/package/alex-3.2.7.1