pest-parser / ast

Apache License 2.0
82 stars 15 forks source link

Generating the ast #8

Open CAD97 opened 6 years ago

CAD97 commented 6 years ago

This is a fun one.

Even with #[derive(FromPest)], the initial creation of the ast structures is fairly rote.

It would be really cool if we could take a .pest file and create a working (though not necessarily ideal) module of ast structures.

Basic shapes - ```peg rule = { a ~ b ~ c } ``` becomes ```rust #[derive(Debug, FromPest)] #[pest_ast(rule(Rule::rule))] pub struct rule { pub a: a, pub b: b, pub c: c, } ``` - ```peg rule = { a | b | c } ``` becomes ```rust #[derive(Debug, FromPest)] #[pest_ast(rule(Rule::rule))] pub enum rule { a(a), b(b), c(c), } ``` - ```peg a* ``` becomes ```rust Vec ``` - ```peg a+ ``` becomes ```rust Vec ``` - ```peg a? ``` becomes ```rust Option ```

Please, ping me [on Gitter](https://gitter.im/pest-parser/pest) or [on Discord](https://discord.gg/FuPE9JE) if you're interested in attacking this. It'll be fun, but somewhat involved.
CAD97 commented 6 years ago

I think it's best to structure this in a way that it's expected that the user take the output and commit it to their repository to be maintained by hand in the future. I don't think the goal should ever be to make a good, semantic AST, but rather to make the "obvious" translation and provide a quick starting point for projects that have designed their grammar already.

loewenheim commented 6 years ago

What are your thoughts on translating rule? to an Option and rule* to a Vec?

CAD97 commented 6 years ago

This should already work manually, and is definitely what I'd consider the "canonical" translation that such a tool should generate. They're powered solely by provided implementations in from-pest:

loewenheim commented 6 years ago

I have more questions :)

CAD97 commented 6 years ago

Generally, I think that a sensible default for atoms would be to store a pest::Span, or potentially a custom From<pest::Span> (which could handle being an owning version instead of borrowing). This is one of the questions I'm not sure how to answer currently.

The problem with

// number = @{ ASCII_DIGIT* }
struct number(u16);

is that the rule matches way too big numbers, so converting it either means panicking or manually plumbing a FatalError through. Alternatively, you could use BigInteger.

So, this is a lot less "this is the obvious way to handle it" than the other proposals for the generated AST, but a potential sketch:


Here's a potential by-hand translation of a small number of rules: Grammar: ```pest a = { "a" } b = { "b" } c = { "c" } number = @{ ASCII_DIGIT* } any = { ANY } seq = { a ~ b ~ c } choice = { a | b | c } compund_seq = { a ~ (b | c) } compound_choice = { (a ~ b) | (b ~ c) } assign = { (a|b|c) ~ "=" ~ number } assigns = { (assign ~ ",")* ~ assign ~ ","? } ``` AST: (tuple structs entirely to sidestep the issue of generating member names) (and abusing fake nested struct syntax) ```rust struct a<'pest>( #[pest_ast(outer)] Span<'pest>, ); struct b<'pest>( #[pest_ast(outer)] Span<'pest>, ); struct c<'pest>( #[pest_ast(outer)] Span<'pest>, ); struct number<'pest>( #[pest_ast(outer)] Span<'pest>, ); struct any<'pest>( #[pest_ast(outer)] Span<'pest>, ); struct seq<'pest>( #[pest_ast(outer)] Span<'pest>, a<'pest>, b<'pest>, c<'pest>, ); enum choice<'pest>{ struct _1(a<'pest>), struct _2(b<'pest>), struct _3(c<'pest>), } struct compound_seq<'pest>( #[pest_ast(outer)] Span<'pest>, a<'pest>, enum _2 { struct _1(b<'pest>), struct _2(c<'pest>), }, ); enum compound_choice<'pest>{ struct _1( #[pest_ast(outer)] Span<'pest>, a<'pest>, b<'pest>, ), struct _2( #[pest_ast(outer)] Span<'pest>, b<'pest>, c<'pest>, ), } struct assign<'pest>( #[pest_ast(outer)] Span<'pest>, enum _1 { struct _1(a<'pest>), struct _2(b<'pest>), struct _3(c<'pest>), }, number<'pest>, ); struct assigns<'pest>( #[pest_ast(outer)] Span<'pest>, Vec)>, assign<'pest>, ); ```

There may be some transcription errors. (Entirely done without putting anything in a compiler.) (I'm also not entirely sure I got which should contain a Span correct; basically any one that corresponds to an actual rule.) Feel free to ask for translating further examples if desired. Key is that this isn't supposed to be the only nor even the ideal transformation. It's only meant to be the "obviously correct" mechanical translation that users can then iterate further on top of.

dragostis commented 6 years ago

I think having the AST take any lifetime is a flawed approach, since it adds a lot of complication in handling it. The first version of pest was designed with an owned Rc<*some input*> shared between Spans and Pairs and was easier to work with. I also don't really have a good alternative. One could have two separate APIs for owned and not owned inputs, but this would put a pretty big burned on the maintainers of the project.

killercup commented 5 years ago

Hey folks, I started https://github.com/killercup/pest-ast-generator for fun a few days ago and just now saw this issue. The approach I took is very simple -- the goal was to get rid of a bunch of struct I needed to write, not to support every edge case.