Closed shlnz closed 4 years ago
Thanks for reporting this!
This should be already fixed in 0.10.0-rc3, however there might still be disambiguation issues if you produce a string that matches either Text
or Base64
, since the new disambiguation rules would give both regexes priority 1. I'll add a priority
flag that you can attach to declarations, to make it possible to bump it up.
Hi I encountered a bug using the newest (rc5) version. I am unsure if it is realted so I am posting it here for now. The following tokens show some weird bevaior
#[regex = r"[0-9][0-9_]*"]
LiteralUnsignedNumber,
#[regex = r"[0-9][0-9_]*\.[0-9][0-9_]*[TGMKkmupfa]"]
LiteralRealNumberDotScaleChar,
#[regex = r"[0-9][0-9_]*\.[0-9][0-9_]*[eE][+-]?[0-9][0-9_]*"]
LiteralRealNumberDotExp,
#[regex = r"[0-9][0-9_]*[TGMKkmupfa]"]
LiteralRealNumberScaleChar,
#[regex = r"[0-9][0-9_]*[eE][+-]?[0-9][0-9_]*"]
LiteralRealNumberExp,
#[regex = r"[0-9][0-9_]*\.[0-9][0-9_]*"]
LiteralRealNumberDot,
All strings that should match LiteralRealNumberDot
(0.1, 3.141) match LiteralUnsignedNumber
but the rest matches correctly. When I remove LiteralUnsignedNumber
. Strings that should match LiteralRealNumberDot
don't match at all. But when I removeLiteralRealNumberScaleChar
or LiteralRealNumberExp
the problems disappears and I can even add LiteralUnsignedNumber
without the previous issue reappearing.
That is definitely a bug, can reproduce, let me see if I can fix it quickly...
Well, I know where the bug in the code is, but I can't seem to fix it right now without breaking other tests. In short, the graph produced for relevant nodes looks as follows:
(...)
40: {
[0-9] ⇒ 40,
E ⇒ 12,
G ⇒ 3,
K ⇒ 3,
M ⇒ 3,
T ⇒ 3,
_ ⇒ 40,
a ⇒ 3,
e ⇒ 12,
f ⇒ 3,
k ⇒ 3,
m ⇒ 3,
p ⇒ 3,
u ⇒ 3,
},
44: [0-9] ⇒ 40,
47: {
. ⇒ 44,
[0-9] ⇒ 47,
E ⇒ 25,
G ⇒ 18,
K ⇒ 18,
M ⇒ 18,
T ⇒ 18,
_ ⇒ 47,
a ⇒ 18,
e ⇒ 25,
f ⇒ 18,
k ⇒ 18,
m ⇒ 18,
p ⇒ 18,
u ⇒ 18,
_ ⇒ 48,
},
(...)
47 is a loop for integers without a fraction, on .
it jumps to 44, expects one digit, then jumps to 40. 40 is a loop, and somewhere along the way some node merging happened that removed the escape branch from 40, which means it will loop and, given absence of a definite error, backtrack.
So there are actually two separate bugs here:
.
backtracking should be disabled, because the produced token will be incorrect, so this should be a hard error.First is easy, looking into 2.
@DSPOM2 I've narrowed it down, but still need to spend some time to figure out a proper solution to this, so will get back to this tomorrow morning.
In the mean time, if you move LiteralRealNumberDot
up so that it's first in that list of variants, the branch merging works as intended, so that can be a temporary solution till I have this worked out.
Thank you so much for the quick response! I already tried to move LiteralRealNumberDot
but apperently I missed most obvious solution
So I found the bug, my merging strategy was conservative and I was only merging forks with forks (think: match
), skipping ropes (think: if let Some(...)
for more than one byte), which caused one of the exit branches to be dropped.
The fix was to actually be more aggressive with the merges, which also produces a more compact graph, which in turn, for a reason that is beyond me right now, sped up my new strings benchmark by some 40%... I added more tests to make sure the benchmarks are actually running correctly, they are. Gonna release rc6 in a moment.
Hi,
while playing with Logos, I noticed that for similar regex's (which share the same start, for example) the longest match is not necessarily chosen. I don't know if this is the same issue as #18, because in this case two regexs create the problem.
I tried the code with version 0.9.7 and version 0.10.0-rc2, but both versions behave the same.