igordejanovic / parglare

A pure Python LR/GLR parser - http://www.igordejanovic.net/parglare/
MIT License
135 stars 32 forks source link

`a_opt: a?;` is not equivalent to `a_opt: a | EMPTY;` and causes an error `First set empty for grammar symbol "a_opt". An infinite recursion on the grammar symbol.` when `a` is a sequence. #144

Closed KOLANICH closed 1 year ago

KOLANICH commented 1 year ago

Description / What I Did

First set empty for grammar symbol "gyro_specifier_opt". An infinite recursion on the grammar symbol.

Let we have the rules

stepper_suffix_tail: flat=flatness_marker gyro=gyro_specifier_opt;

gyro_specifier_opt: gyro_specifier | EMPTY;
flatness_marker: F?;

two_digit_number: Digit Digit;
gyro_specifier: G ratio=two_digit_number;

terminals
//characters
Digit: /[0-9]/;
F: 'F';
G: 'G';

They compile and work as intended.

If we replace

diff --git a/test.pglr b/test.pglr
index f177862..0288e48 100644
--- a/test.pglr
+++ b/test.pglr
@@ -3 +3 @@ stepper_suffix_tail: flat=flatness_marker gyro=gyro_specifier_opt;
-gyro_specifier_opt: gyro_specifier | EMPTY;
+gyro_specifier_opt: gyro_specifier?;

then it stops compiling and produces error message Error at ./test.pglr:3:0:"ier_opt;\n\n **> gyro_speci" => First set empty for grammar symbol "gyro_specifier_opt". An infinite recursion on the grammar symbol.

igordejanovic commented 1 year ago

? is syntax sugar which unrolls to rules with _opt suffix. Please see here.

Using rule gyro_specifier_opt: gyro_specifier?; you end-up with gyro_specifier_opt: gyro_specifier_opt | EMPTY;. The first alternative is invalid self-reference. A workaround is to use different name for gyro_specifier_opt, or gyro_specifier? directly in other rules RHSs (which is IMHO better for readability as it makes apparent that something is optional).

While this is not a bug, i.e. it works as designed, I think it would be nice to have better error report in this case where user rules clashes with rules produced by syntax sugar unroll.

KOLANICH commented 1 year ago

Thank you.

A workaround is to use different name for gyro_specifier_opt, or gyro_specifier? directly in other rules RHSs (which is IMHO better for readability as it makes apparent that something is optional).

I just resorted to doing desugaring in this case myself. It may be better for readability to have a? than a_opt: a | EMPTY; ... a_opt, but for me it is not an option (for now) because for some parser generators one has to have each thing being captured in a separate rule and having the same tree structure for all the grammars for all the backends allows me use the same code for postprocessing.

While this is not a bug, i.e. it works as designed, I think it would be nice to have better error report in this case where user rules clashes with rules produced by syntax sugar unroll.

Could we detect the cases when a: b; and optimize them out by renaming b everywhere into a, when b is a generated name?

igordejanovic commented 1 year ago

Manual desugaring is perfectly fine as the desugaring process will introduce new rules only if they don't already exist.

Could we detect the cases when a: b; and optimize them out by renaming b everywhere into a, when b is a generated name?

I don't think that such transformation should be implemented in parglare.

KOLANICH commented 1 year ago

Thanks for making it clear.