Closed engelant closed 5 years ago
It's an interesting idea. The conceptual problem is accessing a parent from within a child. I very deliberately did not allow this, as I think it might be too confusing and it is hard to make the generic part of the code properly handle all possible transformations from that.
I'd suggest another approach which might work in the existing system: Concentrate on each node and see what you can do locally with it. If Prefix
matches, it will be added as a child to opt< Prefix >
. Now the transform
-method of opt<>
could check if it has children and replace itself with the child (or a sequence of children). If it contains nothing, it could delete itself (by calling n.reset()
). That way you either end up with the child/sequence of children added to the parent or nothing being added to the parent. Would that work for you? (Note I haven't tried it out, I'm just speculating)
Also, I might be missing your point completely, I'm not exactly sure I understand what you are trying to do. Do you have a repository on GitHub I might look at to see the actual code?
So I just put it in a git repo https://github.com/engelant/PEG2PEGTL . As I described above, If I have the rule:
struct Element : seq< opt< Prefix >, Primary, opt< Suffix > > {
};
Since I intend the recursive to_String()
function to return just a string, that then is stuffed into it's parent's to_String(), I need to rearrange that Primary
is a child of Suffix
which again is a child of Prefix
(If they exist). To directly bind It to a template transform function, where I automaticly know the Type (Prefix, Suffix, Primary
) I would need to access the siblings (or it's parent's children).
On the other hand I could identify the children of Element
by id_
in runtime.
Identifing Children by their count wouldn't work, as I can't differenciate between
Prefix Primary / Primary Suffix
.
I looked at your code and I think that transforming the parse tree is not the right choice for you. What you'd like to do is generating the string depending on your own node type. Essentially you'd need something like
std::string to_string() const
{
switch( id_ ) {
case typeid( Element ):
return to_string_element();
//...
}
}
(the above is pseudo-code, you'd need something that actually works)
Each string generator would than check what children it has and how to produce a reasonable string representation for them, calling their to_string()
-methods as needed. You could even use the provided node
class from contrib, have no transform()
-methods and just one freestanding to_string()
(forwarding to the type-dependent to_string()
-variants.
In fact, I'll play with this idea to see if I can replace/improve our src/example/pegtl/abnf2pegtl.cpp
example.
I sort of guessed to do something like this, since there must be a certain concept of usage behind the given parse_tree concept. Just wanted to make sure I'm on the same page. Hopefully I can just add this to the contrib, when it's done.
I added a new example abnf2pegtl2
which uses the parse tree and simplifies the existing example abnf2pegtl
. It is not yet complete and I think it can be further refactored, but it is a good start already, I like it a lot more than the old example. Thanks for the idea! Let me know what you think of it (it is still a bit hackish, I'm aware of that :D).
So, I had a closer look at your new code and I generally like the idea. Maybe I'll try to explain my usage goal and throw in a few ideas, since at some (admittedly early) point my knowledge with template programming is exhaused.
I imagine using a PEG2PEGTL parser/transpiler:
peg2pegtl example.peg --namespace=example --out=example_pegtl_rules.h
which creates a file like this:
/** DO NOT EDIT THIS GENERATED FILE
*** EDIT SOURCE FILE example.peg instead
**/
#ifndef EXAMPLE_PEGTL_RULES_H
#define EXAMPLE_PEGTL_RULES_H
#include <tao/pegtl.hpp>
namespace PEG2PEGTL {
using namespace tao::TAOCPP_PEGTL_NAMESPACE;
struct LF;
struct CR;
struct LF : utf8::one<0x000A> {};
struct CR : utf8::one<0x000D> {};
}
#endif
This rule file would then only contain the rules and most importantly can be built automaticly by the Makefile/IDE before the actual build process, without compromising the logic of the parser. This is also a good point to introduce UTF Identifier support (e.g [:Lu:]), which depends on the listas availible from the unicode page, to transform these into PEGTL range<>
, but that's not for now, at least for me.
How to define unicode ranges by properties/identifiers in c++ for PEGTL
#include <tao/pegtl/contrib/parse_tree.hpp>
#include "example_pegtl_rules.h"
namespace PEG2PEGTL {
using namespace tao::TAOCPP_PEGTL_NAMESPACE;
template< typename > struct store : std::false_type {};
template<> struct store<LF> {};
}
int main() {
...
}
I suppose It also wouldn't be to bad of an idea to introduce an additional parse_tree_utils.h (and .cpp), which provides some default tree debug functions like print().
Speaking of helper functions, since you described a "look at the children" approach, I think it might make things easier to introduce some member functions like:
std::vector< std::unique_ptr< node > >::iterator find_child(const std::type_info *id_, int skip) {
this_is_a_unique_name = 100;
std::vector< std::unique_ptr< node > >::iterator it;
for (it = children.begin(); it != children.end(); ++it) {
if (it->get()->id_ == id_) {
if (skip == 0) {
break;
} else {
--skip;
}
}
}
return it;
}
std::vector< std::unique_ptr< node > >::iterator find_child(const std::type_info *id_) {
return find_child(id_, 0);
}
bool child_has_type(const std::type_info *id_, std::vector< std::unique_ptr< node > >::size_type n) {
return children.at(n).get()->id_ == id_;
}
What gives te possibility to call
child_has_type( &typeid( LF ), 1 );
Or maybe even reduce that with some #define lambda magic to
child_has_type( LF, 1 );
Last but not least I thought of anonymous struct inheritance, but this definitly is way over my head, maybe it's not possible, maybe you can writ this down from the top of your head. Idea is to add a virtual method to the node class, e.g.
template<typename T>
void generate(T *t) {
for (auto child in children) { //Forgive me
child.generate(t);
}
}
Then, and this I don't know exactly how or if to do, inside the specialisation of the store struct<LF>
define something, that if it's set insted of creating the node from PEGTL::node it anonymously inherits from PEGTL::node, overwriting the virtual method and casting the pointer type to the inherited, anonymous node to the PEGTL::node class.
I'm looking very much forward to your oppinion on my suggestions.
I'll leave aside the SO question about Unicode categories for now, I hope I'll find an answer for that in the next days and I'll answer it on SO.
For the rest, let's go through it point by point:
I imagined the use of abnf2pegtl
in such a way that you'd have an input, say foo.abnf
. Your build system would contain a rule to convert it from grammar.abnf
to grammar.hpp
. You'd include it from another header foo.hpp
which looks like this:
#ifndef EXAMPLE_PEGTL_RULES_H
#define EXAMPLE_PEGTL_RULES_H
#include <tao/pegtl.hpp>
namespace PEG2PEGTL {
using namespace tao::TAOCPP_PEGTL_NAMESPACE;
#include "grammar.hpp"
}
#endif
That way your header foo.hpp
could contain anything, including (nested) namespaces, using directives, etc.
This is the reason why abnf2pegtl
only emits the rules and the forward declarations, everything else can (and IMHO should) be kept separated.
Having some parse tree utility functions available for simple inclusion/usage sounds like a good idea, I'll think about it.
To improve support for accessing the children more directly, I was thinking about adding an index operator to the node. With this, you could use (*n)[1]->is< SomeType >()
, or, as we always have nodes wrapped in std::unique_ptr
, the easier to use at()
, something like n->at(1)->is< SomeType >()
. In fact that seems good enough to add it now, I'll commit it in a minute.
Finally, having some kind of callback which is defined per type would be nice, but I'm currently not sure how to do this. Maybe I'll have an idea later, we'll see.
I wasn't able to find a unicode library in c++, where I could query an Identefier, get back a list of its characters. On the other side, I never was a big fan of unicode variable/object/identifier names, so I dropped interest. For strings PEGTL has everything required, and identefiers better be ascii anyway. Just was curious in terms of completeness (e.g. ECMA16 specifies Identifier <- [:ID_Start:][:ID_Continue:]*
).
Your way of only including the generated rules inside the right namespace seems more flexible, I'll go with that.
Looking forward to try out that new bad boy of operator and maybe get a feeling of what else might come in frequently/handy.
By defined per type you mean per Rule, right?
perhaps:
std::map<std::type_info, std::function<void(node*)>>
Then populate the map with your callbacks, which could be lambdas or whatever. The map could even be populated automatically.
What I would look for is a library where I can check if a Unicode code point is within a category. That would make the rest almost trivial.
And yes, by type I mean a Rule.
Signing off for today, I really need to get some sleep :)
FYI: I answered your StackOverflow question.
Also, I refactored the abnf2pegtl2 example a bit to support incremental alternations (=/
). This lead to an interesting way to define your own node class with an overridden emplace_back
-method. The latter has access to both the current node that will be inserted as well as access to the parent node, hence allowing you to modify the parent node (and it's content).
No per-Rule separation yet, though :)
Some months have passed and I'm wondering if we can close this issue or if there are still some open points/questions. Please let me know, thanks!
Thanks for your effort, at the currrent time beeing I had to abandon my plans, so yeah, closed.
I want to have a parser that translates the Syntax introduced in Bryan Ford's paper to translate to PEGTL, as I find this Syntax very readable. To do so I try to utilize the parse_tree.h provided with this library. My approch was to use a cutom node struct, that implements a to_String method like this:
To make the actual magic happen I thought to use the transform function like so:
And it works for me, until it gets a little tricky, since I would need to nest sibling inside each other, if I have a rule like this:
If only I could access Element's children from Prefix and/or Suffix, and move them inside itself. I suppose I could do this by using the node's
id_
and move the logic inside the Element's transform function. Is that the intended usage? How far off the intended usage am I with this approach?