Open kaby76 opened 3 months ago
@kaby76 Thank you very much for this tool and the excellent analysis capabilities. As I mentioned in my first post in the ANTLR discussion group, I have already tried to rewrite the expression rule, which is also left-recursive. However, the performance got even worse afterwards. I followed the steps described in https://tomassetti.me/improving-the-performance-of-an-antlr-parser/ (Handling expressions). I am now trying your approach to break up the left-recursive rules using the * operator, but I am not entirely sure yet how to do this most effectively.
I considered the rewrite of name : identifier_name '::' simple_name | name '.' simple_name | simple_name ;
to name : (identifier_name '::' simple_name | simple_name) ('.' (identifier_name '::' simple_name | simple_name))* ;
. The effect was statistically significant, (N=35, p < 0.0001, t = 4.10, df 60.5) but the change resulted in a minor slowdown, not speedup.
This graph collates with what I observed for call stacks and counts for LT() calls: there are more calls to LT() with the Kleene-closure rewrite of name
! Doesn't make a lot of sense.
@kaby76 I have verified it again and also rewritten the type grammar. I was able to simplify the name grammar a bit more. Even though both rules have been rewritten, I cannot observe any significant improvement in performance. Do you have any other ideas?
// left-recursive name grammar
// name
// : identifier_name '::' simple_name #aliasQualifiedName
// | name '.' simple_name #qualifiedName
// | simple_name #simpleName
// ;
// rewritten name grammar
name : (identifier_name '::' simple_name | simple_name) ('.' simple_name)* ;
// left-recursive type grammar
// type
// : array_type #arrayTypeExpression
// | type post=(INTERR | STAR) #nullableOrPointerTypeExpression
// | re=REF ro=READONLY? type #refReadonlyExpression
// | basic_type #basicType
// ;
// rewritten type grammar
type : (re=REF ro=READONLY?)? (array_type | basic_type) post=(INTERR | STAR)? ;
Yes, I am still working on this. The main issue of why the grammar is exceedingly slow is because of fallbacks. See https://github.com/kaby76/one-parser/issues/4
@XzuluX
Direct-left recursion is handled by Antlr, but I suspect that it is inefficient to use. I updated trgen/trperf to create a heat map of LT() along with a pop-up containing all the call stacks at the time of the call for that token. cover.zip What I found were problems for expressions--not surprising--involving rule
name
.That's when I notice the rule
name
involves direct-left recursion. So, I ran the grammar over my script to find all direct-left recursion.I would recommend that rules
name
andtype
be rewritten using*
-operators.