Open CAD97 opened 6 years ago
I think it's best to structure this in a way that it's expected that the user take the output and commit it to their repository to be maintained by hand in the future. I don't think the goal should ever be to make a good, semantic AST, but rather to make the "obvious" translation and provide a quick starting point for projects that have designed their grammar already.
What are your thoughts on translating rule?
to an Option
and rule*
to a Vec
?
This should already work manually, and is definitely what I'd consider the "canonical" translation that such a tool should generate. They're powered solely by provided implementations in from-pest:
I have more questions :)
What is the most basic structure that a rule could desugar to? My assumption is a wrapper around a single char, like this:
one_char = { ANY }
becomes
struct one_char {
content: char,
}
Am I right about this?
Based on the previous point, a rule
number = { ASCII_DIGIT* }
would turn into a struct
number {
digit: Vec<char>,
}
I probably intended number
to be parsed as an actual number, say, a u16
. Would i first parse it to the ast and then convert it to the actual type I want later on?
Generally, I think that a sensible default for atoms would be to store a pest::Span
, or potentially a custom From<pest::Span>
(which could handle being an owning version instead of borrowing). This is one of the questions I'm not sure how to answer currently.
The problem with
// number = @{ ASCII_DIGIT* }
struct number(u16);
is that the rule matches way too big numbers, so converting it either means panicking or manually plumbing a FatalError
through. Alternatively, you could use BigInteger
.
So, this is a lot less "this is the obvious way to handle it" than the other proposals for the generated AST, but a potential sketch:
<'pest>
.#[pest_ast(outer)] span: pest::Span<'pest>
There may be some transcription errors. (Entirely done without putting anything in a compiler.) (I'm also not entirely sure I got which should contain a Span
correct; basically any one that corresponds to an actual rule.) Feel free to ask for translating further examples if desired. Key is that this isn't supposed to be the only nor even the ideal transformation. It's only meant to be the "obviously correct" mechanical translation that users can then iterate further on top of.
I think having the AST take any lifetime is a flawed approach, since it adds a lot of complication in handling it. The first version of pest was designed with an owned Rc<*some input*>
shared between Span
s and Pair
s and was easier to work with. I also don't really have a good alternative. One could have two separate APIs for owned and not owned inputs, but this would put a pretty big burned on the maintainers of the project.
Hey folks, I started https://github.com/killercup/pest-ast-generator for fun a few days ago and just now saw this issue. The approach I took is very simple -- the goal was to get rid of a bunch of struct I needed to write, not to support every edge case.
This is a fun one.
Even with
#[derive(FromPest)]
, the initial creation of the ast structures is fairly rote.It would be really cool if we could take a
.pest
file and create a working (though not necessarily ideal) module of ast structures.Basic shapes
- ```peg rule = { a ~ b ~ c } ``` becomes ```rust #[derive(Debug, FromPest)] #[pest_ast(rule(Rule::rule))] pub struct rule { pub a: a, pub b: b, pub c: c, } ``` - ```peg rule = { a | b | c } ``` becomes ```rust #[derive(Debug, FromPest)] #[pest_ast(rule(Rule::rule))] pub enum rule { a(a), b(b), c(c), } ``` - ```peg a* ``` becomes ```rust Vec ``` - ```peg a+ ``` becomes ```rust Vec ``` - ```peg a? ``` becomes ```rust Option ```Please, ping me [on Gitter](https://gitter.im/pest-parser/pest) or [on Discord](https://discord.gg/FuPE9JE) if you're interested in attacking this. It'll be fun, but somewhat involved.