yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
880 stars 112 forks source link

Small difference on profile count between peg/leg == chpeg != peglib #217

Closed mingodad closed 2 years ago

mingodad commented 2 years ago

Attached is a zip file with the grammars and input data used to compare the output of profile without packrat, there is a spreadsheet with the output of each (peg/leg, chpeg and cpp-peglib) where peg/leg and chpeg have the same profile numbers but peglib has some small deviation in several of then. I do not know yet what this difference means.

kotlin-diff.zip

yhirose commented 2 years ago

@mingodad, thanks for the report but there is very little information to investigate. Could you try to make the smallest possible grammar and source text, and also provide the profile result that is supposed to be? Then I'll start debugging. Thanks!

mingodad commented 2 years ago

I found one reason (maybe the only one) of the differences, it's at the end of file:

start <- _ (one / two / three)* EOF
one <- "one" _
two <- "two" _
three <- "three" _
_ <- [ \t\n\r]*
EOF <- !.

Input:

one
two
three

When parsing the input three and succeeding we reach EOF cpp-peglib doesn't try again to parse (one / two / three) somehow it check the end of file before and stop.

cpp-peglib profile output:

duration: 0.0001s (100µs)

  id       total      %     success        fail  definition
              12                  9           3  Total counters
                              75.00       25.00  % success/fail

   0           1   8.33           1           0  start
   1           4  33.33           4           0  _
   2           3  25.00           1           2  one
   3           2  16.67           1           1  two
   4           1   8.33           1           0  three
   5           1   8.33           1           0  EOF

chpeg (peg/leg) profile output:

Definition identifier calls:
  DEF-   id       IDENT      %       ISUCC       IFAIL  name
  DEF     0           1   6.67           1           0  start
  DEF     1           4  26.67           1           3  one
  DEF     2           3  20.00           1           2  two
  DEF     3           2  13.33           1           1  three
  DEF     4           4  26.67           4           0  _
  DEF     5           1   6.67           1           0  EOF
  DEF=   --          15 100.00           9           6  --
mingodad commented 2 years ago

With the modified example shown bellow and the same input as above/before cpp-peglib doesn't attempt the rule four but chpeg/peg/leg does.

start <- _ (one / two / three)* four? EOF
one <- "one" _
two <- "two" _
three <- "three" _
four <- "four" _
_ <- [ \t\n\r]*
EOF <- !.

cpp-peglib profile output:

duration: 0s (0µs)

  id       total      %     success        fail  definition
              12                  9           3  Total counters
                              75.00       25.00  % success/fail

   0           1   8.33           1           0  start
   1           4  33.33           4           0  _
   2           3  25.00           1           2  one
   3           2  16.67           1           1  two
   4           1   8.33           1           0  three
   5           1   8.33           1           0  EOF

chpeg (peg/leg) profile output:

Definition identifier calls:
  DEF-   id       IDENT      %       ISUCC       IFAIL  name
  DEF     0           1   6.25           1           0  start
  DEF     1           4  25.00           1           3  one
  DEF     2           3  18.75           1           2  two
  DEF     3           2  12.50           1           1  three
  DEF     4           1   6.25           0           1  four
  DEF     5           4  25.00           4           0  _
  DEF     6           1   6.25           1           0  EOF
  DEF=   --          16 100.00           9           7  --
mingodad commented 2 years ago

And if we make the rule four not optional then cpp-peglib does complain with an error and register the rule four on the profile output but doesn't register an extra failure on the sequence (one / two / three).

start <- _ (one / two / three)* four EOF
one <- "one" _
two <- "two" _
three <- "three" _
four <- "four" _
_ <- [ \t\n\r]*
EOF <- !.

profile output:

duration: 0.0001s (100µs)

  id       total      %     success        fail  definition
              12                  7           5  Total counters
                              58.33       41.67  % success/fail

   0           1   8.33           0           1  start
   1           4  33.33           4           0  _
   2           3  25.00           1           2  one
   3           2  16.67           1           1  two
   4           1   8.33           1           0  three
   5           1   8.33           0           1  four

Error message:

5:1 syntax error, expecting 'four'.
yhirose commented 2 years ago

@mingodad, thanks for the report!