pegex-parser / pegex-pm

Pegex Parser for Perl
62 stars 22 forks source link

Reduce and limit the recursion levels used by parsers #43

Closed agentzh closed 7 years ago

agentzh commented 9 years ago

After porting my LZSQL parser based on Parse::RecDescent over to Pegex, I've noticed the much deeper recursion levels in the Pegex-based parser, as evidenced by the following Perl-land on-CPU flame graph for my LZSQL parser:

http://agentzh.org/misc/flamegraph/perl-land-pegex-lzsql-parser.svg

Also, perl sometimes complaining about it too, which never happened with the old Parser::RecDescent parser:

Processing lightface/p4pv2/effect_sum.lzsql ...
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /opt/perl520/lib/site_perl/5.20.2/Pegex/Parser.pm line 206.

As you can see, the lightface/p4pv2/effect_sum.lzsql file triggers this.

agentzh commented 9 years ago

Seeing it again in another Pegex-based parser of mine, for the Edge language. This has been bothering me ;)

ingydotnet commented 9 years ago

@agentzh do these parses eventually complete? How long do they take to complete (vs number of warnings).

I've been thinking lately on how to minimize the recursion.

Can you write me a minimal test case that triggers the warnings?

agentzh commented 9 years ago

@ingydotnet Yes it completes successfully despite those warnings and it does not seem to take unusually long to complete. There are in fact many other scripts taking much longer but complete without any warnings.

Below is a smallest example that I could ever get:

a.pl

use strict;
use warnings;
use Pegex;

open my $in, "a.pgx" or die $!;
my $grammar = do { local $/; <$in> };
close $in;

open $in, "a.in" or die $!;
my $input = do { local $/; <$in> };
close $in;

pegex($grammar)->parse($input);

a.pgx

prog: - statement*

statement:
  | assignment terminator

terminator: /-
  ';'
-/

assignment:
  variable
  /- ':=' -/
  union-query

integer: /(
  DASH? DIGIT+ \b
)/

union-query:
  | union-all-relations
  | union-select-query

union-select-query:
  select-query

union-all-relations:
  relation+ % /- \b 'union' + 'all' \b -/

variable:
  | array-variable
  | scalar-variable

identifier: /
  \b (! 'select' \b ) ( ALPHA WORD* )
/

scalar-variable:
  '$'
  variable-name
  trait*

trait:
  /- ':local' -/

array-variable:
  '@'
  variable-name
  trait*

variable-name:
  identifier

table:
  <symbol>1-3 % '.'

location:
  | scalar-variable

relation:
  | array-relation
  | aliased-union-query
  | function-relation
  | aliased-table

aliased-table:
  table alias?

select-query:
  /- \b 'select' \b -/
  select-arg+ % comma
  from-clause?
  where-clause?
  group-by-clause?

symbol:
  | scalar-variable
  | identifier

array-relation:
  array-variable alias?

aliased-union-query:
  / '(' -/
  union-query
  /- ')' /
  alias?

function-relation:
  function-call
  alias?

alias:
  | as identifier
  | as scalar-variable

as:
  /- \b 'as' \b -/

select-arg:
  | aliased-arithmetic-expression
  | match-all

aliased-arithmetic-expression:
 arithmetic-expression alias?

match-all:
  '*'

from-clause:
  /- \b 'from' \b -/
  data-source

where-clause:
  /- \b 'where' \b -/
  or-expression

group-by-clause:
  /- \b 'group' + 'by' \b -/
  column+ % comma

unsigned-integer: /(
  DIGIT+
)/

function-call:
  function
  /- '(' -/
  select-arg* % comma
  /- ')' /

arithmetic-expression:
  term+ % add-op

add-op: /-
  ( [-+] )
-/

term:
  factor+ % mul-op

mul-op: /-
  ( ['*/'] )
-/

factor:
  | /- '(' -/
    arithmetic-expression
    /- ')' -/
  | atom

atom:
  | value
  | function-call
  | qualified-column
  | scalar-variable
  | column

value:
  number

number:
  | real
  | integer

real: /(
  DASH?
  DIGIT+
  DOT
  DIGIT+
  \b
)/

data-source:
  | explicit-join-query
  | cross-join-query
  | union-query
  | relation

explicit-join-query:
  relation
  /- 'join' +/
  relation
  /- 'on' -/
  and-expression

cross-join-query:
  <relation>2+ % comma

or-expression:
  and-expression+ % /+ 'or' +/

and-expression:
  relational-expression+ % /+ 'and' +/

relational-expression:
  | between-condition
  | binary-relational-condition

between-condition:
  arithmetic-expression
  /- \b 'between' \b -/
  arithmetic-expression
  /- \b 'and' \b -/
  arithmetic-expression

binary-relational-condition:
  arithmetic-expression
  comparison-op
  arithmetic-expression

comparison-op: /-
  ( '>='
  | '='
  )
-/

qualified-column:
  <symbol>2-3 % '.'

column:
  <symbol>1-3 % '.'

function:
  <symbol>1-2 % '.'

ws: /
  (: BLANK
  | EOL
  )
/

line: /
  ANY* EOL?
/

comma: /-
  ','
-/

a.in

@result :=
    select finclick, finprice, coll_num, foo_amt, foo_num, avg_finprice,
        efficiency, roc
    from (
        select $group,
            sum(finclick) as finclick, sum(finprice) as finprice,
            sum(item_coll_num) + sum(shop_coll_num) as coll_num,
            sum(foo_direct_amt) + sum(foo_indirect_amt) as foo_amt,
            sum(foo_direct_num) + sum(foo_indirect_num) as foo_num,
            ifnull(round(sum(finprice) / sum(finclick),4), 0) as avg_finprice,
            ifnull(round(sum(finclick) * sum(finclick) / sum(finprice), 4), 0) as efficiency,
            ifnull(round((sum(foo_direct_num) + sum(foo_indirect_num)) / sum(finclick), 4), 0) as roc
        from A as a
        where is_shop = $is_shop and campaign_id = $campaign_id
        group by $group
    ) as a;

And running the a.pl script above gives

$ perl a.pl 
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 206.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.
Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153.

Please let me know if you need any further information. Thanks a lot!

ingydotnet commented 9 years ago

Figuring this out should be as simple as turning on Pegex debugging and adding a print statement to each of the subs in Pegex::Parser. Print the subname and a counter. I suspect you are triggering recurses that go just over the warning limit.

I'm travelling all day today, but can look at it after.

On Wed, May 6, 2015 at 12:02 AM, Yichun Zhang notifications@github.com wrote:

@ingydotnet https://github.com/ingydotnet Yes it completes successfully despite those warnings and it does not seem to take unusually long to complete. there are other scripts taking much longer but complete without any warnings.

Below is a smallest example that I could ever get: a.pl

use strict;use warnings;use Pegex; open my $in, "a.pgx" or die $!;my $grammar = do { local $/; <$in> };close $in; open $in, "a.in" or die $!;my $input = do { local $/; <$in> };close $in;

pegex($grammar)->parse($input);

a.pgx

prog: - statement*

statement: | assignment terminator

terminator: /- ';' -/

assignment: variable /- ':=' -/ union-query

integer: /( DASH? DIGIT+ \b )/

union-query: | union-all-relations | union-select-query

union-select-query: select-query

union-all-relations: relation+ % /- \b 'union' + 'all' \b -/

variable: | array-variable | scalar-variable

identifier: / \b (! 'select' \b ) ( ALPHA WORD* ) /

scalar-variable: '$' variable-name trait*

trait: /- ':local' -/

array-variable: '@' variable-name trait*

variable-name: identifier

table:

1-3 % '.' location: | scalar-variable relation: | array-relation | aliased-union-query | function-relation | aliased-table aliased-table: table alias? select-query: /- \b 'select' \b -/ select-arg+ % comma from-clause? where-clause? group-by-clause? symbol: | scalar-variable | identifier array-relation: array-variable alias? aliased-union-query: / '(' -/ union-query /- ')' / alias? function-relation: function-call alias? alias: | as identifier | as scalar-variable as: /- \b 'as' \b -/ select-arg: | aliased-arithmetic-expression | match-all aliased-arithmetic-expression: arithmetic-expression alias? match-all: '*' from-clause: /- \b 'from' \b -/ data-source where-clause: /- \b 'where' \b -/ or-expression group-by-clause: /- \b 'group' + 'by' \b -/ column+ % comma unsigned-integer: /( DIGIT+ )/ function-call: function /- '(' -/ select-arg\* % comma /- ')' / arithmetic-expression: term+ % add-op add-op: /- ( [-+] ) -/ term: factor+ % mul-op mul-op: /- ( ['*/'] ) -/ factor: | /- '(' -/ arithmetic-expression /- ')' -/ | atom atom: | value | function-call | qualified-column | scalar-variable | column value: number number: | real | integer real: /( DASH? DIGIT+ DOT DIGIT+ \b )/ data-source: | explicit-join-query | cross-join-query | union-query | relation explicit-join-query: relation /- 'join' +/ relation /- 'on' -/ and-expression cross-join-query: 2+ % comma or-expression: and-expression+ % /+ 'or' +/ and-expression: relational-expression+ % /+ 'and' +/ relational-expression: | between-condition | binary-relational-condition between-condition: arithmetic-expression /- \b 'between' \b -/ arithmetic-expression /- \b 'and' \b -/ arithmetic-expression binary-relational-condition: arithmetic-expression comparison-op arithmetic-expression comparison-op: /- ( '>=' | '=' ) -/ qualified-column: 2-3 % '.' column: 1-3 % '.' function: 1-2 % '.' ws: / (: BLANK | EOL ) / line: / ANY\* EOL? / comma: /- ',' -/ a.in @result := select finclick, finprice, coll_num, foo_amt, foo_num, avg_finprice, efficiency, roc from ( select $group, sum(finclick) as finclick, sum(finprice) as finprice, sum(item_coll_num) + sum(shop_coll_num) as coll_num, sum(foo_direct_amt) + sum(foo_indirect_amt) as foo_amt, sum(foo_direct_num) + sum(foo_indirect_num) as foo_num, ifnull(round(sum(finprice) / sum(finclick),4), 0) as avg_finprice, ifnull(round(sum(finclick) \* sum(finclick) / sum(finprice), 4), 0) as efficiency, ifnull(round((sum(foo_direct_num) + sum(foo_indirect_num)) / sum(finclick), 4), 0) as roc from A as a where is_shop = $is_shop and campaign_id = $campaign_id group by $group ) as a; And running the a.pl script above gives $ perl a.pl Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 206. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 206. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 187. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. Deep recursion on subroutine "Pegex::Parser::match_next" at /usr/local/Cellar/perl/5.20.1/lib/site_perl/5.20.1/Pegex/Parser.pm line 153. — Reply to this email directly or view it on GitHub https://github.com/ingydotnet/pegex-pm/issues/43#issuecomment-99347977.
pdl commented 9 years ago

I am also having this problem at https://github.com/pdl/jtl/blob/7e069e96543fa24b2a23df167eacbb0e1ad20161/poc/t/Whole.t ( Grammar at https://github.com/pdl/jtl/blob/7e069e96543fa24b2a23df167eacbb0e1ad20161/poc/share/jtls.pgx ). The parse completes fine and is not slow.

I've looked at the Pegex debugging output, and indeed it looks like it's triggered when it exceeds some limit - though I suspect that going over would be a more regular occurrence with more deeply nested inputs.

Is there some way for a grammar which knows it is dealing with a reasonably deeply-nested language to ask Pegex::Parser to suppress this warning? Or if not at the moment, would you condone a PR which permitted this?

ingydotnet commented 9 years ago

PR certainly welcome!

ingydotnet commented 9 years ago

Join #pegex on freenode to discuss

perlpunk commented 7 years ago

Closing this because this seems to be fixed by #46