jcoglan / canopy

A parser compiler for Java, JavaScript, Python, Ruby
http://canopy.jcoglan.com
Mozilla Public License 2.0
420 stars 55 forks source link

Rules unable to use lookahead only #45

Closed russells-crockpot closed 2 years ago

russells-crockpot commented 4 years ago

When defining a rule such as:

end_of_statement <- ';' / eol / &']'

Then the parser (in python and javascript, at least) will hang forever. It only seems to do this when it actually encounters that character, however. It also occurs with negative lookaheads.

russells-crockpot commented 4 years ago

I was able to create a toy grammar that will reproduce it. when 1 or (1) is input, it will work, but when 1(0) is put in, it hangs.


start <- number eos*
eos <- &'('
number <- [0-9]+
jcoglan commented 3 years ago

The string 1 matches the grammar; 1 matches number, and then we attempt to match eos which fails, so eos* consumes no input.

The string (1) is rejected because the first character is ( which does not match number.

1(0) matches the char 1 using number, and then attempts to match eos. The next char is (, so the rule &'(' matches, but then the eos rule does not consume any further input. So eos* loops forever creating an empty node and not advancing the cursor.

Using a lookahead as the final element in a rule, or as the only element, is likely to cause this behaviour and should be avoided. You should always follow a lookahead with something that is guaranteed to consume input, especially if you're inside a loop.

jcoglan commented 2 years ago

Can I close this issue or did you have further information about it?