pointlander / peg

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.
BSD 3-Clause "New" or "Revised" License
1.01k stars 120 forks source link

Feasible to port grammar file from peg.js? Any caveats? #40

Open victorhooi opened 9 years ago

victorhooi commented 9 years ago

I'm wondering how feasible it would be to port a peg.js grammar file to this (peg)? Example:

https://raw.githubusercontent.com/mongodb-js/log/88daf5f4259652f95f2d4a7a01bd3f14751e4812/mongodb-log.pegjs

I'm new to PEGs, but just skimming through the docs, it seems like it should be easy just to translate each line 1:1 - is there anything we need to be wary of?

Curious if it would ever be possible to have a feature to automatically import a grammar file from peg.js? =) (i.e. if this is something that can easily be automated).

pointlander commented 9 years ago

The only problem that I can see is my peg doesn't support named:$(captures). My peg only supports .

The easiest way to implement import would be to add an export feature to peg.js.

On Mon, Jul 20, 2015 at 4:27 PM, Victor Hooi notifications@github.com wrote:

I'm wondering how feasible it would be to port a peg.js grammar file to this (peg)?

https://raw.githubusercontent.com/mongodb-js/log/88daf5f4259652f95f2d4a7a01bd3f14751e4812/mongodb-log.pegjs

I'm new to PEGs, but just skimming through the docs, it seems like it should be easy just to translate each line 1:1 - is there anything we need to be wary of?

Curious if it would ever be possible to have a feature to automatically import a grammar file from peg.js? =) (i.e. if this is something that can easily be automated).

— Reply to this email directly or view it on GitHub https://github.com/pointlander/peg/issues/40.

victorhooi commented 9 years ago

Aha, so for the capture groups, I can just use positional capture groups, rather than named capture groups, right?

The README doesn't say too much on how to extract the values - but I gather it's shoved in a variable called "buffer", which I can then read?

Do you think named capture groups be addd to peg at some point?

applee commented 7 years ago

For lack of named capture groups, it's default to process nested grammar structure, e.g. protobuf or thrift

matthewmueller commented 4 years ago

Do you mind providing an example of how to emulate named capture groups? Coming from peg.js, this has been a sticking point for me. I really want to use this library.

Here's an example:

Assignment = key:Identifier _ "=" _ value:Value {
  return {
    type: "assignment",
    key: key,
    value: value,
  }
}

How would I parse key and value without capture groups?

pointlander commented 4 years ago

You can use AST in which case everything is captured. Example code: https://github.com/pointlander/peg/tree/master/grammars/calculator_ast

matthewmueller commented 4 years ago

Thanks for your response! I did look at the calculator example and tried to figure out how to apply it. The part that I can't wrap my head around is how to handle capture groups that are not leaf nodes.

In the example above, how to pull out Identifier and Value which are themselves a subtree with child nodes.

theclapp commented 4 years ago

Split it up into multiple rules?

(untested)

type Example Peg {
 id, value string
}

assignment = identifier "=" value { p.AddAssignment(p.id, p.value) }
identifier = < .+ > { p.id = text }
value = < .+ > { p.value = text }
matthewmueller commented 4 years ago

Ahh that's interesting, thanks for sharing. I think the missing piece for me is an understanding of how to build a tree using a stack, specifically that p.AddAssignment and how to build down (or up?) from there. Any pointers or resources on how to learn more would be greatly appreciated.

Looking at https://github.com/pointlander/peg/blob/master/peg.peg it appears that nodes are getting pushed on, but it's not clear to me how to make sense of this.

theclapp commented 4 years ago

peg.peg looks like it makes heavy use of <...> and text to build the Peg tree.

In my example above, identifier is parsed ("matched"?) first, and thus p.id is assigned, and then value is parsed, and thus p.value is assigned, and then the assignment rule is matched, and so p.AddAssignment(...) runs.

(Also, since AddAssignment's two parameters are also fields of p, you could just drop them and have AddAssignment reference p.id and p.value directly.)

Does that make sense? Printf is your friend, and the generated Go state machine is also useful in figuring out what gets called where and how. For example this line in peg.peg matches up directly to this line in peg.peg.go.

HTH.

theclapp commented 4 years ago

So ... I just realized that the file I linked you to is the file you were already looking at. Sorry about that. But maybe with the rest of my comment, you can make sense of it.

matthewmueller commented 4 years ago

The suggestion to use fmt.Printf to see the execution order is incredibly helpful!

Also I realized after staring at the syntax more that you can add multiple captures within a single line. For example:

https://github.com/pointlander/peg/blob/master/peg.peg#L22-L26

I thought it was only for OR expressions (e.g. Text / Attribute). This appears to cover the need for named capture groups that peg.js supports since you can always break an expression up.

Thanks for helping me understand this better!

matthewmueller commented 4 years ago

Quick followup: This project finally clicked for me 🙂

I'm working on a route parser similar to similar to https://github.com/pillarjs/path-to-regexp in Express.js. The parser's state looks like this:

type parser Peg {
  route *Route
  segments []Segment
  key Key
  rkey *RegexpKey
  bkey *BasicKey
  optional *OptionalKey
  slash *Slash
  regexp *Regexp
  identifier *Identifier
  text *Text
}

With the help of go-naturaldate, I realized that you need to treat those fields (segments, keys, etc.) as temporary scratch state.

Then you rely on the execution order of the algorithm to fill them in before they're needed. The fmt.Printf advice helped me figure that out.

For anyone else that struggled to make the leap from peg.js like me, here's a full example of the route parser. This example includes a peg.js grammar that I used to first prototype the parser.

Thanks for your patience and assistance. This is a very cool project!