kaby76 / one-parser

0 stars 2 forks source link

Direct left recursion in grammar #3

Open kaby76 opened 3 months ago

kaby76 commented 3 months ago

@XzuluX

Direct-left recursion is handled by Antlr, but I suspect that it is inefficient to use. I updated trgen/trperf to create a heat map of LT() along with a pop-up containing all the call stacks at the time of the call for that token. cover.zip What I found were problems for expressions--not surprising--involving rule name.

That's when I notice the rule name involves direct-left recursion. So, I ran the grammar over my script to find all direct-left recursion.

$ bash /c/Users/Kenne/Documents/GitHub/g4-scripts/find-direct-left-recursion.sh ONEParser.g4
Finding direct left recursive symbols in grammars...
L46: name
     ^
L805: type
      ^

I would recommend that rules name and type be rewritten using *-operators.

XzuluX commented 3 months ago

@kaby76 Thank you very much for this tool and the excellent analysis capabilities. As I mentioned in my first post in the ANTLR discussion group, I have already tried to rewrite the expression rule, which is also left-recursive. However, the performance got even worse afterwards. I followed the steps described in https://tomassetti.me/improving-the-performance-of-an-antlr-parser/ (Handling expressions). I am now trying your approach to break up the left-recursive rules using the * operator, but I am not entirely sure yet how to do this most effectively.

kaby76 commented 3 months ago

I considered the rewrite of name : identifier_name '::' simple_name | name '.' simple_name | simple_name ; to name : (identifier_name '::' simple_name | simple_name) ('.' (identifier_name '::' simple_name | simple_name))* ;. The effect was statistically significant, (N=35, p < 0.0001, t = 4.10, df 60.5) but the change resulted in a minor slowdown, not speedup.

untitled

This graph collates with what I observed for call stacks and counts for LT() calls: there are more calls to LT() with the Kleene-closure rewrite of name! Doesn't make a lot of sense.

XzuluX commented 3 months ago

@kaby76 I have verified it again and also rewritten the type grammar. I was able to simplify the name grammar a bit more. Even though both rules have been rewritten, I cannot observe any significant improvement in performance. Do you have any other ideas?

// left-recursive name grammar
// name
//   :  identifier_name '::' simple_name #aliasQualifiedName
//   | name '.' simple_name #qualifiedName
//   | simple_name          #simpleName 
//   ;

// rewritten name grammar
name : (identifier_name '::' simple_name | simple_name) ('.' simple_name)* ;

// left-recursive type grammar
// type
//   : array_type                  #arrayTypeExpression
//   | type post=(INTERR | STAR)   #nullableOrPointerTypeExpression
//   | re=REF ro=READONLY? type    #refReadonlyExpression
//   | basic_type                  #basicType
//   ;

// rewritten type grammar
type : (re=REF ro=READONLY?)? (array_type | basic_type) post=(INTERR | STAR)? ;
kaby76 commented 3 months ago

Yes, I am still working on this. The main issue of why the grammar is exceedingly slow is because of fallbacks. See https://github.com/kaby76/one-parser/issues/4