Open TheVeryDarkness opened 1 year ago
Would that be possible to add to pest-ast instead for the moment?
Would that be possible to add to pest-ast instead for the moment?
Yes, but it may take some extra work, and I will not be able to reuse some codes in pest as they are private currently. What's more, some codes are supposed to be in pest. For example, I add an implementation of Display for OptimizedExpr, so that I can show the structure of the grammar in a easy-to-read format. And thanks to your reply :)
That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).
We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.
That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).
We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.
As I didn't change any public imterface that already exists in previous versions (generate in pest-generator is wrapped and private), I think it won't be a breaking change that demands the change of major version number. So I think we needn't place those codes into pest-ast, but I may do that after all bugs fixed if you think it's needed. What's more, I do add some public interfaces, so I think reviews are required after the fork is finished. Thanks to your reply :)
semver-breaking changes can be sneaky in Rust, e.g. if some implicit autoderives disappear, but we can see how it goes
Well, I see. Just give me time. Though I think I'm doing that carefully, we can never be too careful.
Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.
Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.
@0nyr you can check out pest-ast in the meantime https://github.com/pest-parser/ast/blob/master/examples/csv.rs
Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.
@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?
@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?
@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?
I create a pull request, #889, for that.
Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.
@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?
Hi @TheVeryDarkness. I don't know, may I have a look at your code ?
In my personal project, I have made a kind of tree of struct Node
like so:
#[derive(Debug, PartialEq)]
pub struct Node<'a, T> {
pub sp: Span<'a>, // contains information about the node's position (position of span) to be matched to string in the source code
pub data: T, // contains the data, wrapped into an inner type
}
I have defined a type AST for the root Node:
pub type AST<'a> = Node<'a, TranslationUnit<'a>>;
and my inner node types are like so:
// AST nodes
#[derive(Debug, PartialEq)]
pub struct TranslationUnit<'a> {
pub functions: Option<Vec<Node<'a, Function<'a>>>>,
pub main_function: Node<'a, Function<'a>>,
}
#[derive(Debug, PartialEq)]
pub struct Function<'a> {
pub name: Node<'a, Identifier>,
pub return_type: TypeSpecifier,
pub params: Option<Vec<Node<'a, Declaration<'a>>>>,
pub body: Node<'a, Block<'a>>,
}
Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:
#[derive(Debug, PartialEq)]
pub struct SomeNode<'a> {
pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>,
pub function_node: Node<'a, Function<'a>>,
}
This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.
Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.
@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?
Hi @TheVeryDarkness. I don't know, may I have a look at your code ?
In my personal project, I have made a kind of tree of
struct Node
like so:#[derive(Debug, PartialEq)] pub struct Node<'a, T> { pub sp: Span<'a>, // contains information about the node's position (position of span) to be matched to string in the source code pub data: T, // contains the data, wrapped into an inner type }
I have defined a type AST for the root Node:
pub type AST<'a> = Node<'a, TranslationUnit<'a>>;
and my inner node types are like so:
// AST nodes #[derive(Debug, PartialEq)] pub struct TranslationUnit<'a> { pub functions: Option<Vec<Node<'a, Function<'a>>>>, pub main_function: Node<'a, Function<'a>>, } #[derive(Debug, PartialEq)] pub struct Function<'a> { pub name: Node<'a, Identifier>, pub return_type: TypeSpecifier, pub params: Option<Vec<Node<'a, Declaration<'a>>>>, pub body: Node<'a, Block<'a>>, }
Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:
#[derive(Debug, PartialEq)] pub struct SomeNode<'a> { pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>, pub function_node: Node<'a, Function<'a>>, }
This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.
Thanks a lot.
I've used generics for almost all cases, but proc macros are also required to make them work.
By the way, we may need hooks on rules to convert CST to AST.
Hello! I've fixed all known bugs in my fork, and then separated my codes into another repository. Do I need to commit my codes to pest-ast now or after I finish most of my work? I hope my codes could be a part of pest in the future. I'm writing a document for the interfaces that those codes provide, and we may discuss them then.
@TheVeryDarkness you can open a PR on pest-ast if you'd like some preliminary feedback
@0nyr https://docs.rs/pest_typed_derive/latest/pest_typed_derive/ FYI -- @TheVeryDarkness made great progress here
I'm working to add some codes to pest to support generating statically typed syntax tree in my fork, and I'm going to make a pull request after fixing all bugs and maybe adding some documentations and tests.
As we can see, there are several crates doing similar things, such as pest-ast and pest-consume. So I'm afraid that you may not accept this contribution if my implementation is bad or you don't agree with my design.
I'll talk about what I did and what I'm going to do then. If you have any suggestions, please tell me.
My principles
TypedParser
will create structs for sequences, enums for choices and a lot of generics structs for strings, peek, and etc.generate
inpest-generator
to avoid repetition.Something to be discussed
var_i
, wherei
is the index of the variant in those choices. And for clarity, I didn't use tuple structs for sequences, and name struct fields asfield_i
, wherei
is the index. This may not be the best choice, and please tell me if there is a better design :) Maybe using node tags?TODO list
pest-typed.
pest3.
(Some of these may be done in pest3 instead pest-typed, as pest-typed still needs compatibility with pest2)
TypedParser
.Box
when requested. See pest-generator/graph.rs. Done in https://github.com/TheVeryDarkness/pest-typed/commit/e9b24d6c84dd3ebc99f212e1891c65bbd598717c.