Open groteck opened 1 year ago
This requires a bit of creativity with how the grammar is structured to make sure tokenization takes place correctly, but what I'd recommend is having two enum variants that both start by looking for a sequence of tokens without a :
, and then only one variant can parse a token after it as well as more text.
pub enum BodyOrFooter {
Footer(
#[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
String,
#[rust_sitter::leaf(text = ":")]
(),
#[rust_sitter::leaf(pattern = r"[\w\s]+", transform = |v| v.to_string())]
String,
),
Body(
#[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
String,
),
}
This works because Tree Sitter always looks for the token matching the [^:]+
regex, and can then branch based on whether the following token is a :
or not.
Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body
and Footer
. So then you parse a Vec<Body>
followed by a Vec<Footer>
.
Hi @shadaj, using your answer I was able to parse the footer and the body, but the second part:
Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body and Footer. So then you parse a Vec
followed by a Vec
This part is a bit more complex, the only way that I found about to parse into a Vec
is using:
pub struct Language {
pub type_: Type,
#[rust_sitter::leaf(pattern = r"\s")]
_whitespace: (),
#[rust_sitter::leaf(pattern = r".+", transform = |v| v.to_string())]
pub description: String,
#[rust_sitter::delimited(
#[rust_sitter::leaf(text = "/n")]
()
)]
pub footer: Option<FooterLine>,
#[rust_sitter::delimited(
#[rust_sitter::leaf(text = "/n/n")]
()
)]
pub body: Option<BodyParagraph>,
}
#[derive(Debug, PartialEq)]
pub struct FooterLine {
#[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
pub tag: String,
#[rust_sitter::leaf(text = ": ")]
_separator: (),
#[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
pub value: String,
}
#[derive(Debug, PartialEq)]
pub struct BodyParagraph {
#[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
pub value: String,
}
But not really sure what is wrong there 😄
But Since this was related to branching Ithink we can close the issue as solved and maybe raise the next about vectors in another?
Hi, Thanks for the work I really like the library and enjoy how fast I madre progress with relative none idea about language parsers.
Contex: I'm trying to do a parser for the Conventional Commits spec that is something like this:
Example:
In my parser I have the issue that the
Body
and theFooter
of the conventional commit can be close to the same the main difference is the existent tag on the footer. But with the available macros + the regex I don't find a way to branch and say if you find something like.+:
It is a footer so process the tags.Please understand that my knowledge about language parsing is really limited and I'm using this library as one of my first approaches. If there is nonsense here let me know and point me to the wright direction if it's possible.