Grammar seems sensitive to characters in comments

openfga / language

Grammar for the OpenFGA modeling language

https://openfga.dev

Apache License 2.0

16 stars 7 forks source link

Grammar seems sensitive to characters in comments #115

Closed evankanderson closed 7 months ago

evankanderson commented 9 months ago

The following model will fail to parse in fga model test --tests testfile.yaml

model.fga:

# I can't believe it's not yaml
model
  schema 1.1

type user

testfile.yaml:

model_file: ./model.fga

Error:

Error: error running tests due to failed to transform due to 2 errors occurred:
    * syntax error at line=1, column=7: token recognition error at: '''
    * syntax error at line=1, column=20: token recognition error at: '''

If you remove the comment on model.fga, the test passes.

evankanderson commented 9 months ago

It also seems like you might want to allow comments before relationDeclaration, so that you can write the following:

model
  schema 1.1

type user
  relations
    # This is a bit tricky, but we want to do something fancy here.
    define member: [user#member]

This currently fails with:

Error: error running tests due to failed to transform due to 2 errors occurred:
    * syntax error at line=6, column=4: mismatched input '#' expecting 'define'

evankanderson commented 9 months ago

I am currently writing in a somewhat-formal style, so that my comments do not trip up the parser, but it is somewhat awkward to read.

s/I am/I'm, s/do not/don't/, and s/it is/it's/

evankanderson commented 8 months ago

The problem seem to be that relationDefTypeRestrictionBase uses the # character, so comment parsing is done in the grammar rather than in the lexer. Having the parsing done in the grammar means that the lexer may fail to produce a token when parsing a STRING (https://github.com/openfga/language/blob/main/OpenFGALexer.g4#L107), which prevents the parser from reaching the grammar stage.

One possible solution would be to add a COMMENT production to the lexer which tiggers on ^# or WHITESPACE '#'

evankanderson commented 8 months ago

(This also got tripped up by ; at the end of a line in the Apache license header

evankanderson commented 8 months ago

It appears that # // may be a work-around comment character, but that seems awkward.

Having CEL_COMMENT also consume the trailing newline might fix that problem, as it appears that it routes the CEL comments to a separate channel and not to the main grammar parser.

rhamzeh commented 7 months ago

Thanks for raising it @evankanderson - this should be resolved now, let us know if you still encounter issues