taocpp / PEGTL

Parsing Expression Grammar Template Library
Boost Software License 1.0
1.94k stars 228 forks source link

Content of node not empty in spite of empty parent node #198

Closed icze closed 4 years ago

icze commented 4 years ago

Hallo,

first of all thanks for your fantastic library, especially the parse tree and ABNF parts are quite helpful. I'm new on PEGTL and currently implementing a large grammar with PEGTL 2.8.1 and Visual Studio 16.4.6. Doing this some problems are arising. Therefore here a reduced example:

#include <tao/pegtl.hpp>
#include <tao/pegtl/analyze.hpp>
#include <tao/pegtl/contrib/parse_tree.hpp>
#include <tao/pegtl/contrib/parse_tree_to_dot.hpp>

namespace peg = tao::pegtl;

namespace grammar {

  template< char... Cs > struct padded_keyword : peg::pad<peg::keyword<Cs...>, peg::space> {};
  template< char... Cs > struct padded_one : peg::pad<peg::one<Cs...>, peg::space> {};

  struct IDENT : peg::pad< peg::identifier, peg::space > {}; // allow whitespaces
  struct entry : peg::seq< IDENT, IDENT > {};
  struct entry_list : peg::star< entry, padded_one< ';' > > {};
  struct composition : peg::seq< entry_list, padded_keyword< 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n' > > {};
  struct function : peg::seq< composition, padded_keyword< 'e', 'n', 'd' > > {};

} // namespace grammar

int main(int argc, char** argv)
{
  if (peg::analyze< grammar::function >()) {
    std::cerr << "analyze() found errors in grammar." << std::endl;
    return 1;
  }

  peg::memory_input<> in(
    "function\n"
    "end\n"
    "\n",
    "source");

  try {
    const auto root = peg::parse_tree::parse< grammar::function >(in);
    if (root == nullptr) {
      std::cerr << "parse() failed: root is nullptr" << std::endl;
      return 1;
    }
    peg::parse_tree::print_dot(std::cout, *root);
  }
  catch (const peg::parse_error & e) {
    const auto p = e.positions.front();
    std::cerr << e.what() << std::endl
      << in.line_at(p) << std::endl
      << std::string(p.byte_in_line, ' ') << '^' << std::endl;
    return 1;
  }

  return 0;
}

Here the output:

digraph parse_tree
{
  x000002B628295B30 [ label="ROOT" ]
  x000002B628295B30 -> { x000002B628295DE0 }
  x000002B628295DE0 [ label="struct grammar::function\nfunction\nend\n\n" ]
  x000002B628295DE0 -> { x000002B628295880, x000002B6282A9390 }
  x000002B628295880 [ label="struct grammar::composition\nfunction\n" ]
  x000002B628295880 -> { x000002B628293FA0, x000002B6282A3C50 }
  x000002B628293FA0 [ label="struct grammar::entry_list\n" ]
  x000002B628293FA0 -> { x000002B6282A91A0 }
  x000002B6282A91A0 [ label="struct grammar::entry\nfunction\nend\n\n" ]
  x000002B6282A91A0 -> { x000002B6282A9770, x000002B6282A52E0 }
  x000002B6282A9770 [ label="struct grammar::IDENT\nfunction\n" ]
  x000002B6282A9770 -> { x000002B6282A9840, x000002B6282A5210 }
  x000002B6282A9840 [ label="struct tao::pegtl::ascii::identifier\nfunction" ]
  x000002B6282A5210 [ label="struct tao::pegtl::ascii::space\n\n" ]
  x000002B6282A52E0 [ label="struct grammar::IDENT\nend\n\n" ]
  x000002B6282A52E0 -> { x000002B6282A3EB0, x000002B6282A3F80, x000002B6282A3B80 }
  x000002B6282A3EB0 [ label="struct tao::pegtl::ascii::identifier\nend" ]
  x000002B6282A3F80 [ label="struct tao::pegtl::ascii::space\n\n" ]
  x000002B6282A3B80 [ label="struct tao::pegtl::ascii::space\n\n" ]
  x000002B6282A3C50 [ label="struct grammar::padded_keyword<102,117,110,99,116,105,111,110>\nfunction\n" ]
  x000002B6282A3C50 -> { x000002B6282A54E0, x000002B6282A55B0 }
  x000002B6282A54E0 [ label="struct tao::pegtl::ascii::keyword<102,117,110,99,116,105,111,110>\nfunction" ]
  x000002B6282A55B0 [ label="struct tao::pegtl::ascii::space\n\n" ]
  x000002B6282A9390 [ label="struct grammar::padded_keyword<101,110,100>\nend\n\n" ]
  x000002B6282A9390 -> { x000002B6282A6660, x000002B6282A6B40, x000002B6282A6320 }
  x000002B6282A6660 [ label="struct tao::pegtl::ascii::keyword<101,110,100>\nend" ]
  x000002B6282A6B40 [ label="struct tao::pegtl::ascii::space\n\n" ]
  x000002B6282A6320 [ label="struct tao::pegtl::ascii::space\n\n" ]
}

Now I'm wondering why "entry_list" has empty content but "entry" contains

function
end

Can anybody give me a hint? Regards Ingo

d-frey commented 4 years ago

That is a bug. The problem is that the star-rule always succeeds. In this case, however, star<A,B> could leave remnants of A even though B didn't match. The problem goes away if you use star<seq<A,B>> but, of course, that should not be necessary.

I'll try to find a fix...

icze commented 4 years ago

Thanks for the fast answer. In the meanwhile I will integrate your workaround into abnf2pegtl and see if I get my large grammar run ...

ColinH commented 4 years ago

We have identified a couple of potential solutions, however at the moment they all seem somewhat involved. We will take a bit of time to see how best to proceed. Hopefully the work-around with additional seq<> rules allows you to continue for now.

d-frey commented 4 years ago

I committed a fix in the master branch. Can you try if it works for you? Do you need the fix to be backported to the 2.x branch?

icze commented 4 years ago

Thanks for the fix. I tested it with the example and it works as expected. Implementing my large grammar I found another issue #199.

According to 2.x: It depends on when you are planning to release 3.0.0 and if the master is already somehow stable. From my side, I'm integrating your library the first time in our software and I'm able to wait 2-3 month for the 3.0.0.

Thanks a lot for your great work.

Add: When I'm thinking more about the merging, the workaround is sufficient for me on 2.x

d-frey commented 4 years ago

Thanks for checking. For now, we will continue to improve the fix in the master branch, but ultimately I'll try and backport it, if possible, when it stabilized for those users still stuck on C++11.

About the release date of the 3.0.0: Colin and me talked about it yesterday and once this issue is fully resolved (there might be a few more corner cases that we need to discuss), we will release 3.0.0. It is already stable for quite some time now and any additional (API-changing) features will have to wait for a future 4.x.