hydro-project / rust-sitter

Use Tree Sitter to parse your own languages in Rust
MIT License
601 stars 17 forks source link

I can not find a way of branching between expressions #25

Open groteck opened 1 year ago

groteck commented 1 year ago

Hi, Thanks for the work I really like the library and enjoy how fast I madre progress with relative none idea about language parsers.

Contex: I'm trying to do a parser for the Conventional Commits spec that is something like this:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Example:

feat(homepage): Add a new header to my home page

We need to add a new link to the homepage so customers can do some action that was not there before

closes: #445

In my parser I have the issue that the Body and the Footer of the conventional commit can be close to the same the main difference is the existent tag on the footer. But with the available macros + the regex I don't find a way to branch and say if you find something like .+: It is a footer so process the tags.

Please understand that my knowledge about language parsing is really limited and I'm using this library as one of my first approaches. If there is nonsense here let me know and point me to the wright direction if it's possible.

shadaj commented 1 year ago

This requires a bit of creativity with how the grammar is structured to make sure tokenization takes place correctly, but what I'd recommend is having two enum variants that both start by looking for a sequence of tokens without a :, and then only one variant can parse a token after it as well as more text.

pub enum BodyOrFooter {
    Footer(
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        String,
        #[rust_sitter::leaf(text = ":")]
        (),
        #[rust_sitter::leaf(pattern = r"[\w\s]+", transform = |v| v.to_string())]
        String,
    ),
    Body(
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        String,
    ),
}

This works because Tree Sitter always looks for the token matching the [^:]+ regex, and can then branch based on whether the following token is a : or not.

shadaj commented 1 year ago

Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body and Footer. So then you parse a Vec<Body> followed by a Vec<Footer>.

groteck commented 1 year ago

Hi @shadaj, using your answer I was able to parse the footer and the body, but the second part:

Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body and Footer. So then you parse a Vec followed by a Vec

.

This part is a bit more complex, the only way that I found about to parse into a Vec is using:


    pub struct Language {
        pub type_: Type,
        #[rust_sitter::leaf(pattern = r"\s")]
        _whitespace: (),
        #[rust_sitter::leaf(pattern = r".+", transform = |v| v.to_string())]
        pub description: String,
        #[rust_sitter::delimited(
            #[rust_sitter::leaf(text = "/n")]
            ()
        )]
        pub footer: Option<FooterLine>,
        #[rust_sitter::delimited(
            #[rust_sitter::leaf(text = "/n/n")]
            ()
        )]
        pub body: Option<BodyParagraph>,
    }

    #[derive(Debug, PartialEq)]
    pub struct FooterLine {
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub tag: String,
        #[rust_sitter::leaf(text = ": ")]
        _separator: (),
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub value: String,
    }

    #[derive(Debug, PartialEq)]
    pub struct BodyParagraph {
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub value: String,
    }

But not really sure what is wrong there 😄

groteck commented 1 year ago

But Since this was related to branching Ithink we can close the issue as solved and maybe raise the next about vectors in another?