taocpp / PEGTL

Parsing Expression Grammar Template Library
Boost Software License 1.0
1.94k stars 228 forks source link

parse_tree "transform" methods might need access to the input source #186

Closed skyrich62 closed 5 years ago

skyrich62 commented 5 years ago

The transform methods could be used for semantic analysis, and might therefore need access to the input source in order to generate an error message, (e.g. show location of the error, and perhaps the source line containing the error).

--- a/include/tao/pegtl/contrib/parse_tree.hpp
+++ b/include/tao/pegtl/contrib/parse_tree.hpp
@@ -186,11 +186,11 @@ namespace TAO_PEGTL_NAMESPACE::parse_tree
       }

       // this one, if applicable, is more specialized than the above
-      template< typename Selector, typename Node, typename... States >
-      auto transform( std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( Selector::transform( n, st... ) ) )
-         -> decltype( Selector::transform( n, st... ), void() )
+      template< typename Selector, typename Input, typename Node, typename... States >
+      auto transform( const Input &in, std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( Selector::transform( in, n, st... ) ) )
+         -> decltype( Selector::transform( in, n, st... ), void() )
       {
-         Selector::transform( n, st... );
+         Selector::transform( in, n, st... );
       }

       template< unsigned Level, typename Analyse, template< typename... > class Selector >
@@ -416,7 +416,7 @@ namespace TAO_PEGTL_NAMESPACE::parse_tree
             auto n = std::move( state.back() );
             state.pop_back();
             n->template success< Rule >( in, st... );
-            transform< Selector< Rule > >( n, st... );
+            transform< Selector< Rule > >( in, n, st... );
             if( n ) {
                state.back()->emplace_back( std::move( n ), st... );
             }
@@ -483,8 +483,8 @@ namespace TAO_PEGTL_NAMESPACE::parse_tree
    struct remove_content
       : apply< remove_content >
    {
-      template< typename Node, typename... States >
-      static void transform( std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->Node::remove_content( st... ) ) )
+      template< typename Input, typename Node, typename... States >
+      static void transform( const Input &in, std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->Node::remove_content( st... ) ) )
       {
          n->remove_content( st... );
       }
@@ -494,8 +494,8 @@ namespace TAO_PEGTL_NAMESPACE::parse_tree
    struct fold_one
       : apply< fold_one >
    {
-      template< typename Node, typename... States >
-      static void transform( std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->children.size(), n->Node::remove_content( st... ) ) )
+      template< typename Input, typename Node, typename... States >
+      static void transform( const Input &in, std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->children.size(), n->Node::remove_content( st... ) ) )
       {
          if( n->children.size() == 1 ) {
             n = std::move( n->children.front() );
@@ -510,8 +510,8 @@ namespace TAO_PEGTL_NAMESPACE::parse_tree
    struct discard_empty
       : apply< discard_empty >
    {
-      template< typename Node, typename... States >
-      static void transform( std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->children.empty(), n->Node::remove_content( st... ) ) )
+      template< typename Input, typename Node, typename... States >
+      static void transform( const Input &in, std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->children.empty(), n->Node::remove_content( st... ) ) )
       {
          if( n->children.empty() ) {
             n.reset();
d-frey commented 5 years ago

Each node already contains the begin() and end() positions, you can access the input it matched on as well as the source. Isn't this enough or am I missing some use-case here?

skyrich62 commented 5 years ago

It's a small corner case. I want to do something like this:

Error: foo.ada, line 5: Consecutive underlines not allowed in numeric literal
    x := 5__000;
          ^~
Error: foo.ada, line 6: Unexpected underline following '.'
    y := 3._1415;
          ^~

Ada allows underlines in numeric literals as long as there are no consecutive underlines and decimal points, and it neither begins nor ends with an underline.

In order to print the entire source line, I need access to the "Input" object to call in.line_at(pos) Other use-cases come to mind in regards to error recovery as well. In ada, an identifier cannot have two or more consecutive underlines, nor can it start nor end with an underline.

For the purpose of error recovery, I want to accept the illegal things, complain about them and keep parsing as if they weren't there.

I want to use the transform mechanic to make basic transformations on the tree, (fold_one, etc). But also to convert "numeric literal" into a binary number and put that in the node. (I'm using a custom node with a std::variant<> for different node "kinds").

Another thought: perhaps instead of the "source" in the node, a reference to the Input? Or a way to retrieve the input from the iterator object?

d-frey commented 5 years ago

Hm, thinking about it, I guess the solution is probably much simpler... you replaced the existing internal::transform overload, but what we should actually do is add another overload. That should allow the existing transform-methods in the selectors to work unmodified (and most people won't care about the input) but at the same time it will allow you to opt-in to receiving the input as the first parameter. Let me give that a try...

d-frey commented 5 years ago

The other idea of putting a reference to the input into the node won't really work, as it would require the node to depend on the input's type. I think that is much more invasive than the above option...

d-frey commented 5 years ago

It seems the overload works quite well, please check if it also solves your problem.

skyrich62 commented 5 years ago

Re: putting a reference, (or pointer) to the input into the node. Yeah, I thought about that. It would be possible if the input classes all had a common base pure virtual class, though. That would also eliminate the need to use templates on apply, start, failure, success, etc. But that's quite a big change.

d-frey commented 5 years ago

Yeah, there's actually a lot of things wrong with the input layer. I have some ideas how to refactor it, but not enough time. 😔 I guess it'll have to wait and I also should not postpone the upcoming 3.0 release anymore because of it, we can still do a 4.0 somewhere in the (far?) future.

skyrich62 commented 5 years ago

Overload works perfectly in my use-case. Thanks again. (Dunno why I didn't think of that... I plead lack of sleep.) :)