Closed petersteneteg closed 6 years ago
Can you give us the source to see exactly how you are trying to combine these two things?
Sure
#include <tao/pegtl.hpp>
#include <tao/pegtl/contrib/parse_tree.hpp>
#include <cassert>
namespace pegtl = tao::TAO_PEGTL_NAMESPACE;
namespace dynamic {
struct long_literal_id : pegtl::plus<pegtl::not_one<'['> > {};
struct long_literal_open : pegtl::seq<pegtl::one<'['>, long_literal_id, pegtl::one<'['> > {};
struct long_literal_mark {
using analyze_t = pegtl::analysis::generic<pegtl::analysis::rule_type::SEQ>;
template <pegtl::apply_mode, pegtl::rewind_mode, template <typename...> class Action,
template <typename...> class Control, typename Input>
static bool match(Input& in, const std::string& id, const std::string&) {
if (in.size(id.size()) >= id.size()) {
if (std::memcmp(in.current(), id.data(), id.size()) == 0) {
in.bump(id.size());
return true;
}
}
return false;
}
};
struct long_literal_close : pegtl::seq<pegtl::one<']'>, long_literal_mark, pegtl::one<']'> > {};
struct long_literal_body : pegtl::any {};
struct grammar : pegtl::if_must<long_literal_open,
pegtl::until<long_literal_close, long_literal_body>, pegtl::eof> {};
template <typename Rule>
struct action : pegtl::nothing<Rule> {};
template <>
struct action<long_literal_id> {
template <typename Input>
static void apply(const Input& in, std::string& id, const std::string&) {
id = in.string();
}
};
template <>
struct action<long_literal_body> {
template <typename Input>
static void apply(const Input& in, const std::string&, std::string& body) {
body += in.string();
}
};
template <typename Rule>
using selector =
pegtl::parse_tree::selector<Rule,
pegtl::parse_tree::apply_store_content::to<
long_literal_open,
long_literal_body,
long_literal_close
>
>;
} // namespace dynamic
int main( int argc, char** argv ) {
std::string id;
std::string body;
pegtl::string_input<> in("[foo[test]foo]", "");
// Fails to compile
auto root =
pegtl::parse_tree::parse<dynamic::grammar, dynamic::selector, dynamic::action>(in, id, body);
assert(root != nullptr)
assert(id == "foo");
assert(body == "test");
}
What I can see is a missing semicolon in the first of the three assert()
s at the end of main
, and that while within parse_tree::parse()
the Action's apply
methods are "shielded" from the additional state argument introduced by parse_tree::parse()
, the match
methods aren't, so long_literal_mark::match()
needs to accept and ignore an arbitrary additional first function argument. But that still doesn't make it compile, there's something else going on here...
That wasn't trivial to diagnose at all, but I found two problems in the parse tree support that I was able to fix.
One problem remains: The apply/apply0-methods will see the states you expect, the match-method will see an additional first state that the parse tree uses internally. I currently don't see any way to hide it. This means that after you update to the current HEAD and you added the missing semicolon after the assert()
as Colin mentioned, you still need to make one more change. You need to change the signature of the match method and add a dummy parameter:
template< pegtl::apply_mode,
pegtl::rewind_mode,
template <typename...> class Action,
template <typename...> class Control,
typename Dummy,
typename Input >
static bool match( Input& in, const Dummy&, const std::string& id, const std::string& ) {
// ...
}
Great! Thanks for the quite answer and fixes. This should solve my problems.
But I noticed that if I add the said extra dummy parameter above. Using the regular parse function
pegtl::parse<dynamic::grammar, dynamic::action>(in, id, body);
will no longer work.
Indeed, this is why I left the issue open and I still hope that I can find a solution for it. But this is not easy, if you have any ideas, please let me know. :)
For now I just added both overload of match, with and without the dummy, which seems to work fine. But a completely non-invasive solution would of course be better...
While playing with this I also tried using intrusive switching https://github.com/taocpp/PEGTL/blob/master/doc/Actions-and-States.md#intrusive-switching To be able to add the needed actions and states to the grammar to do the dynamic matching. Which would be nice since it would encapsulate all the grammar related things. But I am not able to reset the action/states to the previous ones for the "inside of the match" ie the long_literal_body here. Is there a way to do this?
I've committed a change that solves the first problem, the internal state is now hidden and won't be visible to you in the parameter list.
I'm not sure about the intrusive switching, it might now also work or not - please give it a try and if there is still some problem, a small example would be helpful.
I'm not sure moving the internal state to a raw, type-erased pointer inside of the Input
argument rather than as an extra States...
argument in order to try to hide it from the Control
and Rule
function signatures is a good solution. This solution has rather strange interpretations: what is this "internal" state and why is it inside of the Input
class? Furthermore, it has consequences outside of just the parse_tree
code since it will increase the size of these Input
structs by the size of a pointer, regardless of whether or not parse_tree
is used or any other notion of "internal" state exists.
Instead, I think it's appropriate for consistency between non-parse_tree
and parse_tree
code to continue to pass this extra "internal" state parameter in parse_tree
(like commit d02c14107992cdc0edb558acb4b609d408cf2e8a and before) and require that custom parse_tree
Control< Rule >::match
and Rule::match
functions be prepared to accept this extra parameter. I would argue that the match
functions are special in that internal::duseltronik::match
is really central to how all of PEGTL works. Furthermore, customizing the Control< Rule >::match
is considered advanced[1] and custom Rules
which accept States...
are considered complex[2]. Therefore, if a user chooses to customize Control< Rule >::match
or Rule::match
, such customization is relatively advanced usage of PEGTL and requires understanding by the user that he or she must accept whatever States...
are passed to the parse
function, including those from the internal implementation details of parse_tree
.
For now I just added both overload of match, with and without the dummy, which seems to work fine. But a completely non-invasive solution would of course be better...
Thus, this seems like the cleanest solution in my mind. Yes, it requires you to know that an internal state parameter must be accepted for parse_tree
, but this is not so different than how you would use custom Rule::match
with parse
function alone. It may not be obvious that you need this parameter as well at first with parse_tree
, but this might be better resolved with documentation than trying to work around it in the implementation.
Note that any custom Control< Rule >::(start|success|failure|raise)
functions need not accept this additional internal state parameter because the parse_tree::internal::make_control
methods which extend those of the user-provided Control
do not pass this internal parameter down to the user. Also, my PR #127 will ensure that neither external or internal (to the Node
) Action< Rule >::(apply|apply0)
functions will require this internal state parameter. On that note, I had rebased my work on top of 804d2cb65e1a2f6de47ea7dad915a9d511bfee94, but with the moving target of parse_tree
, I haven't had time to complete this.
[1] Advanced Control [2] Complex Rules
I'm not worried about an extra pointer in the input classes, they are not constructed/destructed during a parsing run (unlike a marker, etc.).
I also think that from a user's point of view building a parse tree should be as consistent and as noninvasive as possible, the parse tree state is an internal detail. Hiding it in the input class leaves a lot of general rules intact that are otherwise broken, which previously lead to all sort of hacks and work arounds.
If you look at the commit where I added the internal state, it also simplifies the control class of the parse tree. And, importantly, it allows a single grammar to be used for both basic parsing (possibly with other actions) without rules and actions knowing about the parse-tree-state, and for a parsing run that builds a parse tree with everything that is connected.
Concerning PR #127: I already suggested that you create a separate implementation so we can compare them properly and you are independent of the current "moving target". Maybe both ways have their pros and cons and could live side-by-side.
I'm not worried about an extra pointer in the input classes, they are not constructed/destructed during a parsing run (unlike a marker, etc.).
Fair enough, but the concept of the Input
classes containing an internal state variable does not make much sense since this internal state has nothing to do with the Input
class at all, and it (currently) only applies to parse_tree
code.
I also think that from a user's point of view building a parse tree should be as consistent and as noninvasive as possible
It is more consistent with previous custom match
implementations when one considers that the match
function must accept all States...
(at least for complex rules--simple rules need not worry about them at all). I'm not entirely sure what noninvasive means here because either way one must implement the custom match
function to perform advanced matching, and using a Rule
in a parse_tree::parse
context is not the same as a default PEGTL parse
because the expected output is different. Even so, the parse_tree
Rule
's match
function need only call the non-parse_tree
Rule
's match
function if they do not need access to the parse_tree
state, and there will be no duplication of code.
the parse tree state is an internal detail.
I'm not sure it's fair to consider the state an internal detail when implementing custom match
, which as always must accept all States...
. Sure the implementation of the state is an internal detail to the parse_tree
, but the fact that it is passed to match
is not so much an internal detail. This is apparent in the implementation of parse_tree::parse
and can be simply documented in the usage of parse_tree
since it's not expected to change how many internal state parameters are required.
Hiding it in the input class leaves a lot of general rules intact that are otherwise broken, which previously lead to all sort of hacks and work arounds.
I'd consider hiding it in the Input
class a hack of its own. What hacks were previously necessary with parse_tree
? What general rules are broken without hiding the parse_tree
state? I've used parse_tree
with many different general rules without issue, and on a quick perusal, I don't see any built-in rules which expect a specific number of States...
. Or do you mean custom, general rules external to PEGTL?
If you look at the commit where I added the internal state, it also simplifies the control class of the parse tree.
You mean by removing the implementation of start
, success
, failure
, etc. functions for the make_control
specialization when the Rule
Node
should not be stored and it is a leaf or parent of unselected leaves? This could already be done prior to 0346e101e0b5e101c8295bc6a17e922d3c4d1d2d.
Or do you mean simplified relative to 057db2cec87ceb47bd8fef36257a9e48fc43cd90..d02c14107992cdc0edb558acb4b609d408cf2e8a? I don't really understand why these commits were even necessary because as far as I can tell they only implement the match
function just like the normal
Control
while still passing the internal state argument. Wouldn't this already be achieved by using the normal
match
function, which is the default? Is not Control2
template template parameter the same as Control
? Doesn't this just come from parse
,
template< typename Rule,
template< typename... > class Action = nothing,
template< typename... > class Control = normal,
apply_mode A = apply_mode::ACTION,
rewind_mode M = rewind_mode::REQUIRED,
typename Input,
typename... States >
bool parse( Input&& in, States&&... st )
{
return Control< Rule >::template match< A, M, Action, Control >( in, st... );
}
and thus Control = Control2 = make_control
from parse_tree
?
And, importantly, it allows a single grammar to be used for both basic parsing (possibly with other actions) without rules and actions knowing about the parse-tree-state, and for a parsing run that builds a parse tree with everything that is connected.
Actions already do not need to know about the parse_tree
state, both prior to 057db2cec87ceb47bd8fef36257a9e48fc43cd90 and in my own implementation of actions. It is only the match
function which is special, and as I've argued, it's a rather advanced yet integral part of PEGTL which I think requires some awareness by the (advanced) user that tries to write custom Rule
or Control
match
. There is nothing preventing a single grammar from being used for both basic parsing and parsing with parse_tree
--all that is required is to overload the match
function inside of any custom Rule
s which will be used in both. For simple Rule
s, no overload is even necessary, and it will work with both.
Concerning PR #127...
Okay, will do.
One alternative solution which could make this explicit overload unnecessary is to instead move the parse_tree
internal state argument to the end, after the user-provided States...
, i.e.
template< typename Rule,
typename Node,
template< typename... > class Selector = internal::store_all,
template< typename... > class Action = nothing,
template< typename... > class Control = normal,
typename Input,
typename... States >
std::unique_ptr< Node > parse( Input&& in, States&&... st )
{
internal::state< Node > state;
if( !TAO_PEGTL_NAMESPACE::parse< Rule, Action, internal::make_control< Selector, Control >::template type >( in, st..., state ) ) {
return nullptr;
}
...
}
Obviously also the order would have to be updated in make_control
functions, but this is an internal detail which shouldn't be a problem.
Then the user-defined custom match
function which accepts an exact number of arguments need only provide a template parameter ExtraStates...
at the end of the match
function to take any additional arguments not needed by the user.
So instead of
struct complex_custom_rule
{
template< pegtl::apply_mode,
pegtl::rewind_mode,
template < typename... > class Action,
template < typename... > class Control,
typename InternalState,
typename Input >
static bool match( Input& in, const InternalState&, const std::string& id, const std::string& ) {...}
}
the user would only write the templated match
function
struct complex_custom_rule
{
template< pegtl::apply_mode,
pegtl::rewind_mode,
template < typename... > class Action,
template < typename... > class Control,
typename Input,
typename... ExtraStates >
static bool match( Input& in, const std::string& id, const std::string&, ExtraStates&&... ) {...}
}
which should work with both basic parse and parse_tree
parse. Simple rules again will still work with both.
Hi, interesting discussion. I would have to agree with @mkrupcale I think adding the state to the Input is a, albeit easy, not very nice solution. As a user I think it is reasonable to think that one should be a able to implement the same parse_tree functionality your self using the existing pegtl api. And in that case adding stuff into the input is not really an option. (guess one could do ones own input classes, but that is not very nice either) And even if I could reuse the same void pointer inside of the input, I would never look for a state pointer in the input classes...
On the other hand I think that the use case above with the dynamic matching. in this case for matching [foo[...]foo] is not really a very advanced concept and seems like a common thing to do. So preferable it should work easily out of the box. From this prespective I think the last solution by mkrupcale with the extra state in the end is quite nice and unintrusive.
I talked to @ColinH about it and we agree with both of you. I'll commit a new version shortly that will append the state at the end and will no longer try to hide it. As a consequence, however, all apply
/apply0
/match
methods (except the "simple" match
method) must now accept additional state parameters beyond what the user intends to provide to parse
. We think that this is an acceptable price and will allow the rest of the implementation to be straight-forward and consistent.
Hi
I was trying to use the parse_tree functionality together with the dynamic matching. I tried your dynamic match example with parse_tree::parse. but was not able to compile. Firstly the long_literal_mark struct needs a analyze_t type. then I get errors about long_literal_id not having a apply or apply0.
Any suggestions?