ast: Foos cannot be a list, since it precedes Foo

mewmew commented 6 years ago

I've been unable to find a good way of representing the following grammar:

CatchSwitchTerm -> CatchSwitchTerm
    : 'catchswitch' 'within' Scope=ExceptionScope '[' Handlers=(Label separator ',')+ ']' 'unwind' UnwindTarget=UnwindTarget Metadata=(',' MetadataAttachment)+?
;

%interface UnwindTarget;

UnwindTarget -> UnwindTarget
    : 'to' 'caller' -> UnwindToCaller
    | Label
;

As running the above grammar through Textmapper results in the following error:

$ make
ll.tm,2739: `Handlers` cannot be a list, since it precedes UnwindTarget
lalr: 0.221s, text: 0.521s, parser: 2170 states, 226KB
make: *** [Makefile:7: gen] Error 1

Any suggestions would be warmly welcome.

mewmew commented 6 years ago

Below follows a minimal reproducible example which exhibits this issue issue25.tm:

language issue25(go);

lang = "issue25"
package = "github.com/mewspring/foo/issue25"
eventBased = true
eventFields = true
eventAST = true

:: lexer

id_tok : /[a-z]/
'cs' : /cs/
'delim' : /delim/

whitespace : /[ \t\r\n]+/     (space)

:: parser

%input Foo;

Foo -> Foo
   : 'cs' Bars=ID+ 'delim' Bar=ID
;

ID -> ID
   : id_tok
;

$ textmapper
issue25.tm,22: `Bars` cannot be a list, since it precedes Bar
lalr: 0.014s, text: 0.081s, parser: 10 states, 0KB

Note: Without the delimiter I realize it may be impossible to know where the Bars list ends and the Bar nonterminal begin. However, the delimiter should make it possible to make this discrimination, which is why it was included the the example grammar.

mewmew commented 6 years ago

For the specific use case above, I found the following workaround:

Essentially, make UnwindTarget a concrete type instead of a sum type (i.e. interface).

UnwindTarget -> UnwindTarget
    : UnwindToCaller
    | Label
;

UnwindToCaller -> UnwindToCaller
    : 'to' 'caller'
;

This approach would not work however, for the minimal reproducible example posted above, as it uses the same non-terminal for both the Bars list and the succeeding Bar non-terminal.

inspirer commented 6 years ago

This is working as intended and is caused by a design decision to not have "roles" in ASTs and model them with intermediate nodes instead. All children nodes within one node a distinguished by their type only, which simplifies (read speeds up) their construction during parsing.

For the Foo rule above, in the parsed tree, you will see the following structure. It is hard for the traversal code to deduce that the last ID is a Bar, and everything up to last is Bars.

Foo
    ID
    ID
    ID

Whenever you have a rule in which A follows B, Textmapper checks that the sets of nodes they produce do not overlap and complains if they do. There are certain cases when Textmapper copes with overlapping nodes - when both entries have names and the first occurrence is not optional.

expr : left=expr '+' right=expr  -> PlusExpr  // OK, right = second expr in PlusExpr
slice : '[' left=expr? ':' right=expr?  ']'    // not OK - the left expr is optional
slice : '[' left=(expr -> FromExpr)? ':' right=(expr -> ToExpr)?  ']'    // OK - both expressions got "roles"

For your case, there is a special syntax to combine multiple labels into an array:

Foo -> Foo
   : 'cs' Bars=ID+ 'delim' Bars+=ID     // Bars+= tells Textmapper that this ID is part of the Bars list.

You can create a list out of 3 elements:

F : '(' A ',' A ',' A ')' ;  // Not OK. 
F :   '(' a1=A ',' a2=A ',' a3=A ')'   //  OK, F.a1() == A, F.a2() = A, ...
F :   '(' list+=A ',' list+=A ',' list+=A ')'   //  F.list() == [A, A, A]

There are messier cases which Textmapper handles correctly:

%interface Expression, Value;
Selector :  expr=Expr '.' val=Value ;

Expr -> Expression:
   ID   -> Identifier 
 | '(' .... ')'   -> ParenthesizedExpr
 ....
;

Value -> Value:
   ID -> Identifier
 | .....
;

Identifier can be reported in both Expr and Value and so Textmapper forces you to give names to those right-hand side occurrences. In the generated val() method, it produces the following code: n.Child(selector.Expression).Next(selector.Value) correctly skipping the first identifier if both are identifiers.

mewmew commented 6 years ago

Thanks for the write-up @inspirer! It will take me some time to digest.

All children nodes within one node a distinguished by their type only, which simplifies (read speeds up) their construction during parsing.

This makes sense, but is also unfortunate for Textmapper grammar writers. Would it be possible to annotate nonterminals with more information than just their type -- thereby making them distinguishable -- specifically for the grammar rules which would otherwise cause conflicts such as the one reported above? This could be a middle ground between simplified node construction during parsing and usability for grammar writers. Of course, I realize that doing so may end up increasing the complexity of the parser quite a lot.

Edit: just to clarify, I was wondering if Textmapper would be capable of automatically annotating such nodes (to provide the needed information for disambiguation) when it identifies that there would otherwise be a conflict (as it is already capable of detecting).

inspirer commented 6 years ago

I used to derive semantic actions producing ASTs from declarative annotations. This worked okay-ish but was not easily testable and the declarative syntax was quite sophisticated to make it usable. I might revive this at some point as an option but not sooner the migration to Go is complete. See

https://github.com/inspirer/textmapper/blob/d2e9e4deeba69ec462f59664fe9ce8db47a62097/tm-tool/textmapper/src/org/textmapper/tool/parser/textmapper.tm#L395

https://github.com/inspirer/textmapper/blob/d2e9e4deeba69ec462f59664fe9ce8db47a62097/tm-tool/textmapper/src/org/textmapper/tool/parser/TMParser.java#L1671

inspirer commented 6 years ago

Automatic annotation is a double edged sword: there will be cases when it will be helpful but in many cases you'll have to disambiguate yourself. I have enough experience with parsing widely used general-purpose language to say that the current scheme strikes a good balance between being concise and not hiding important details from the user. Suggestions are welcome though ;)

mewmew commented 6 years ago

We now switched to using the workaround you proposed in https://github.com/inspirer/textmapper/issues/25#issuecomment-437503739 and it works well (implemented in rev https://github.com/llir/grammar/commit/b01e345b685dd61b51787d1255a9ddfd0b6c9baf).

This allowed us to once more have UnwindTarget be defined as an %interface so, at least from a grammar perspective the issue is resolved.

I still personally believe it should be possible to handle this in a different way. but that ties in to issue #23 essentially, of how to determine where one non-terminal ends and another starts. If we manage to resolve that issue, then the core of this issue would also be resolved without the need of using the workaround proposed in https://github.com/inspirer/textmapper/issues/25#issuecomment-437503739.

Feel free to close this issue if you like, as #23 should be enough to track both of these.

Cheers, Robin

inspirer / textmapper

ast: Foos cannot be a list, since it precedes Foo #25