Closed Commelina closed 2 years ago
OK I deprecated 3.2.7 and made it unbuildable. We'll need some new test cases for this.
@andreasabel Assuming this isn't a crazy bootstrapping issue, it looks like you made all the parser changes between 3.2.6 and 3.2.7. Would have any time to help debug this?
@Ericson2314 : I started to have a look. Then I hit #195.
(I have some old repo of alex
which still has a copy of data/AlexTemplate-ghc
, so this does not stop me now.)
Shrunk example: Issue197.x
{}
@iec60559suffix = (32|64|128)[x]?
tokens :-
$white+ ;
@iec60559suffix ;
{}
Phew, finally found the magic incantations to make bisection work:
$ cat > issue197.sh
cp data/AlexTemplate-ghc data/AlexTemplate.hs
cabal run alex -- Issue197.x
^D
$ chmod +x issue197.sh
$ git bisect start 3.2.7 3.2.6
$ git bisect run ./issue197.sh
Result: d6653f3ef7dffb462baa8b4b7165d1fa8f2b36a1 is the first bad commit Author: Andreas Abel Date: Fri Jan 31 19:09:42 2020 +0100
[ fixed #141 ] regex: allow arbitary repetitions
Previously, the
r{n,m}
and related forms were restricted to single digit numbersn
andm
.We fix this by recognizing numbers of > 1 digits as NUM token and adding rules to the regex parsing that also allows NUMs for
n
andm
. Previously, only CHAR was allowed, which subsumes single digits.
Running info on the happy grammar,
...
58 rep -> '{' CHAR '}' (49)
59 rep -> '{' CHAR ',' '}' (50)
60 rep -> '{' CHAR ',' CHAR '}' (51)
61 rep -> '{' NUM '}' (52)
62 rep -> '{' NUM ',' '}' (53)
63 rep -> '{' NUM ',' NUM '}' (54)
...
71 set0 -> CHAR (62)
72 set0 -> CHAR '-' CHAR (63)
...
The problem is these lower rules don't have NUM
equivalents.
I think remembering to double up NUM and CHAR is hard, and all the more so when this case we would need to "unparse" the number, yet also preserve things like 001
vs 1
.
This makes makes we think we should not have a NumT
, and instead just have a number nonterminal, parsed from char terminals. Does that sound good?
Er, @Ericson2314, I am already working on this. I just lack the repo rights to assign this to me so that you would know about it.
This makes makes we think we should not have a
NumT
, and instead just have a number nonterminal, parsed from char terminals. Does that sound good?
Yes this sounds good. I just wish you would not ask me to help and then race for a fix. Wasted my time, I could have looked into something else.
Er, @Ericson2314, I am already working on this. I just lack the repo rights to assign this to me so that you would know about it.
Oh sorry! That's awkward. I should have watched the time zones more carefully, I just assumed you stopped at the bisect.
Yes, let's get you those perms.
I should have stated in text that I am looking into a fix...
PR #202 seems to solve the problem in a satisfactory way, by restricting the parsing of numerals to inside multiplicity expressions {
n,m}
.
@andreasabel BTW I earlier sent you an email taking the git email address. Is that a fine email address? I want to reach out to @simonmar more directly to get to permissions.
@andreasabel BTW I earlier sent you an email taking the git email address. Is that a fine email address? I want to reach out to @simonmar more directly to get to permissions.
I found the message. I am often slow with email, looking into my github notifications more often than my email.
Thanks for your efforts!
@Commelina and others, please try https://hackage.haskell.org/package/alex-3.2.7.1
I tried the latest alex 3.2.7 on a common-used library
language-c
-0.9.0.1. However, it failed to parseLex.x
:It seems that the problem happened here:
Then I rolled my alex back to version 3.2.6 then everything worked. I am not sure if it is a bug.