yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
916 stars 113 forks source link

Changing root feature #304

Closed micrenda closed 3 months ago

micrenda commented 3 months ago

I would like to ask if it is possible to pass a specific target rule instead of using the main priority chain when parsing a string.

Let me clarify with an example:

Suppose I have the following rule set:

species    <- molecule ( '(' excitatopm ')' )?
molecule <- # Description of a molecule
excitation <- excitation_ele / excitation_vib / excitation_rot
excitation_ele <- # something
excitation_vib <- # something
excitation_rot <- # something

Usually, in my code, I would do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("H2O(2V1)", result);

This works fine. However, in my unit tests or in other parts of the code, I might want to parse according to a specific rule. In that case, I would like to do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("2V1", "excitation_vib", result);

This way, I would use excitation_vib as the root rule and expect an exception if excitation_vib does not fully consume the input.

Is this possible? With the current implementation, to achieve something like this, I would need to change the grammar by making the target rule the new root. However, I was wondering if there is a better way to do it.

micrenda commented 3 months ago

Added PR https://github.com/yhirose/cpp-peglib/pull/305 which implement this feature: it may need some rework.

yhirose commented 3 months ago

@micrenda thanks for the feedback, but I don't understand the example grammar... The grammar isn't valid. ('excitatopm' is not defined, and 'excitation' is not referenced.) So pegParser.load_grammar(s); doesn't work due to the incorrect grammar. cpp-peglib doesn't allow such incorrect grammar...

micrenda commented 3 months ago

Hello

In the example I wrote I just omitted the actual implementation, because it was not important (and I also made a typos!). Let me give you a valid grammar:

species    <- molecule ( ' ' '(' excitation ')' )?
molecule <- ([A-Z] [a-z]? [0-9]?)+
excitation <- excitation_ele / excitation_vib / excitation_rot
excitation_ele <- 'A' / 'B' / 'C'
excitation_vib <- [0-9]* 'V' [0-9]+
excitation_rot <- 'J' [0-9]+

In my code, now I can do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("H2O (2V1)", result);

And it will work perfectly.

However, using the PR https://github.com/yhirose/cpp-peglib/pull/305, it is now possible to also do this in unit testing or in other section of code:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("2V1",  result, nullptr, "excitation_vib");

For me this is a life saver :-)

yhirose commented 3 months ago

Thanks for the clear explanation. I now fully understand what you would like to do. (By the way, I put comments in your pull request to fix problems that I found, and the following sample uses the revised version.)

Unfortunately, there are some situations where the parser doesn't work properly with this. %whitespace feature is one of them.

// sample.cc
#include <iostream>
#include <peglib.h>

using namespace peg;

int main(void) {
  parser parser(R"(
Start       <- A
A           <- B (',' B)*
B           <- '[one]' / '[two]'
%whitespace <- [ \t\n]*
  )");

  std::cout << std::boolalpha;

  std::cout << parser.parse("[one],[two]") << std::endl;
  std::cout << parser.parse(" [one] , [two] ") << std::endl;

  std::cout << parser.parse("[one],[two]", nullptr, "A") << std::endl;
  std::cout << parser.parse(" [one] , [two] ", nullptr, "A") << std::endl;
}
> ./sample
true
true
true
false

As you can see, %whitespace only works with Start. It's because cpp-peglib applies some special treatments only to the start rule. You can see what are added to the start rule in perform_core function. https://github.com/yhirose/cpp-peglib/blob/5ef7180a12f305ac92fad73efb4d9a7b81e5b980/peglib.h#L3992

yhirose commented 3 months ago

@micrenda I made a change to allow users to specify the start definition rule name in the parser constructor and load_grammar method at #306. (Unfortunately, we cannot do the same in parse method because of the reason I explained in the above comment. But hope this pull request can satisfy your needs.)

auto grammar = R"(
  Start       <- A
  A           <- B (',' B)*
  B           <- '[one]' / '[two]'
  %whitespace <- [ \t\n]*
)";

peg::parser parser(grammar, "A"); // Start Rule is "A"

  or

peg::parser parser;
parser.load_grammar(grammar, "A"); // Start Rule is "A"

parser.parse(" [one] , [two] "); // OK

Could you take a look at it when you have time? Thanks!