pest-parser / pest

The Elegant Parser
https://pest.rs
Apache License 2.0
4.61k stars 258 forks source link

Generating statically typed syntax tree. #882

Open TheVeryDarkness opened 1 year ago

TheVeryDarkness commented 1 year ago

I'm working to add some codes to pest to support generating statically typed syntax tree in my fork, and I'm going to make a pull request after fixing all bugs and maybe adding some documentations and tests.

As we can see, there are several crates doing similar things, such as pest-ast and pest-consume. So I'm afraid that you may not accept this contribution if my implementation is bad or you don't agree with my design.

I'll talk about what I did and what I'm going to do then. If you have any suggestions, please tell me.

My principles

Something to be discussed

TODO list

pest-typed.

pest3.

(Some of these may be done in pest3 instead pest-typed, as pest-typed still needs compatibility with pest2)

tomtau commented 1 year ago

Would that be possible to add to pest-ast instead for the moment?

TheVeryDarkness commented 1 year ago

Would that be possible to add to pest-ast instead for the moment?

Yes, but it may take some extra work, and I will not be able to reuse some codes in pest as they are private currently. What's more, some codes are supposed to be in pest. For example, I add an implementation of Display for OptimizedExpr, so that I can show the structure of the grammar in a easy-to-read format. And thanks to your reply :)

tomtau commented 1 year ago

That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).

We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.

TheVeryDarkness commented 1 year ago

That's fine (if you don't mind the extra work 🫣) -- one thing is that for the current pest 2.X, the changes should be semver-compatible (I haven't looked into that fork to see if that's the case), while for pest-ast, it's pre-1.0 and breaking changes are expected (as long as they are documented).

We can later look into incorporating pest-ast parts into pest, but it's better to start with pest-ast first and refine it there.

As I didn't change any public imterface that already exists in previous versions (generate in pest-generator is wrapped and private), I think it won't be a breaking change that demands the change of major version number. So I think we needn't place those codes into pest-ast, but I may do that after all bugs fixed if you think it's needed. What's more, I do add some public interfaces, so I think reviews are required after the fork is finished. Thanks to your reply :)

tomtau commented 1 year ago

semver-breaking changes can be sneaky in Rust, e.g. if some implicit autoderives disappear, but we can see how it goes

TheVeryDarkness commented 1 year ago

Well, I see. Just give me time. Though I think I'm doing that carefully, we can never be too careful.

0nyr commented 1 year ago

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

tomtau commented 1 year ago

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr you can check out pest-ast in the meantime https://github.com/pest-parser/ast/blob/master/examples/csv.rs

TheVeryDarkness commented 1 year ago

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

TheVeryDarkness commented 1 year ago

@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?

TheVeryDarkness commented 1 year ago

@tomtau Can I add an implementation of Display for OptimizedExpr? Will it be a breaking change?

I create a pull request, #889, for that.

0nyr commented 1 year ago

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

Hi @TheVeryDarkness. I don't know, may I have a look at your code ?

In my personal project, I have made a kind of tree of struct Node like so:

#[derive(Debug, PartialEq)]
pub struct Node<'a, T> {
    pub sp: Span<'a>,   // contains information about the node's position (position of span) to be matched to string in the source code
    pub data: T,            // contains the data, wrapped into an inner type
}

I have defined a type AST for the root Node:

pub type AST<'a> = Node<'a, TranslationUnit<'a>>;

and my inner node types are like so:

// AST nodes
#[derive(Debug, PartialEq)]
pub struct TranslationUnit<'a> {
    pub functions: Option<Vec<Node<'a, Function<'a>>>>,
    pub main_function: Node<'a, Function<'a>>,
}

#[derive(Debug, PartialEq)]
pub struct Function<'a> {
    pub name: Node<'a, Identifier>,
    pub return_type: TypeSpecifier,
    pub params: Option<Vec<Node<'a, Declaration<'a>>>>,
    pub body: Node<'a, Block<'a>>,
}

Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:

#[derive(Debug, PartialEq)]
pub struct SomeNode<'a> {
    pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>,
    pub function_node: Node<'a, Function<'a>>,
}

This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.

TheVeryDarkness commented 1 year ago

Having the option to build an AST automatically from pest out-of-the-box would be a really great and time-saving enhancement. Currently, working with Pest means having to write both the grammar and then again the AST parsing, which is a lot of work.

@0nyr I'm glad that you might like it, but I'm not sure how to name the fields in those generated structs and enums. Do you have any ideas?

Hi @TheVeryDarkness. I don't know, may I have a look at your code ?

In my personal project, I have made a kind of tree of struct Node like so:

#[derive(Debug, PartialEq)]
pub struct Node<'a, T> {
    pub sp: Span<'a>,   // contains information about the node's position (position of span) to be matched to string in the source code
    pub data: T,            // contains the data, wrapped into an inner type
}

I have defined a type AST for the root Node:

pub type AST<'a> = Node<'a, TranslationUnit<'a>>;

and my inner node types are like so:

// AST nodes
#[derive(Debug, PartialEq)]
pub struct TranslationUnit<'a> {
    pub functions: Option<Vec<Node<'a, Function<'a>>>>,
    pub main_function: Node<'a, Function<'a>>,
}

#[derive(Debug, PartialEq)]
pub struct Function<'a> {
    pub name: Node<'a, Identifier>,
    pub return_type: TypeSpecifier,
    pub params: Option<Vec<Node<'a, Declaration<'a>>>>,
    pub body: Node<'a, Block<'a>>,
}

Depending on what you want to do, maybe you could reuse the Node idea, by having several kind of nodes for generic AST, or even using macros to build nodes with names extracted from the PEG files... You can then names those inner fields like so:

#[derive(Debug, PartialEq)]
pub struct SomeNode<'a> {
    pub optional_function_nodes: Option<Vec<Node<'a, Function<'a>>>>,
    pub function_node: Node<'a, Function<'a>>,
}

This are just ideas, but it depends on how you have planned to tackle the AST structs and enums generation.

Thanks a lot.

I've used generics for almost all cases, but proc macros are also required to make them work.

By the way, we may need hooks on rules to convert CST to AST.

TheVeryDarkness commented 1 year ago

Hello! I've fixed all known bugs in my fork, and then separated my codes into another repository. Do I need to commit my codes to pest-ast now or after I finish most of my work? I hope my codes could be a part of pest in the future. I'm writing a document for the interfaces that those codes provide, and we may discuss them then.

tomtau commented 1 year ago

@TheVeryDarkness you can open a PR on pest-ast if you'd like some preliminary feedback

tomtau commented 10 months ago

@0nyr https://docs.rs/pest_typed_derive/latest/pest_typed_derive/ FYI -- @TheVeryDarkness made great progress here