yhirose / go-peg

Yet another PEG (Parsing Expression Grammars) parser generator for Go
MIT License
63 stars 8 forks source link

Binary Operator precedence Escaping # Character #11

Open williamsharkey opened 3 years ago

williamsharkey commented 3 years ago

I need to parse "a#OR#b". In the grammar, I can match #OR# just fine, by using the literal '#OR#'.

Because #OR# is a binary operator amongst other binary operators, I need to specify precedence.

In the options section, I can't figure out how to escape the hash. I'm assuming the escape sequences \# (and perhaps tilda ~# ?) don't work here.

I would try to add an escape method myself to the options parser, and send a pull request, but I don't actually understand these defintion in parser.go well enough. I think it may involve modifying these lines:

https://github.com/yhirose/go-peg/blob/a1af152bac31a6323c2d4b8870ceff8520c93fb4/parser.go#L122-L125

( Side note: Do people actually write grammars in this functional style, or is this automated output from this tool? Seems difficult for humans to read, at least me. )

Because this repo is deprecated in favor of the C++ one, I don't expect assistance. Instead I am going to fork this repo and add a method for changing the comment character ('#') to a user defined comment character. IE, a function that re-defines rComment.Ope.

https://github.com/yhirose/go-peg/blob/a1af152bac31a6323c2d4b8870ceff8520c93fb4/parser.go#L102

I know that redefining the comment character isn't the best way to fix this, but I'm pretty confident it will work for what I need. If anyone more experienced feels like showing me how to escape sequences in the precedence option section, that would be great too.


Thanks for making such a super library -- overall it has saved me so much heartache to have such a lean, simple, and useful library in pure Go!
williamsharkey commented 3 years ago

Follow up for anyone who has this issue in the future: it may not be the most pretty solution, but redefining the comment literal *worked for me*.

My fork just adds one small function:

https://github.com/williamsharkey/go-peg/blob/98595930efb0b5c778a2d4d7261308b54e217726/parser.go#L13-L16

// Set a custom comment character. Default is #
func CommentCharacterSet(s string) {
    rComment.Ope = Seq(Lit(s), Zom(Seq(Npd(&rEndOfLine), Dot())), &rEndOfLine)
}

Before parsing, call peg.CommentCharacterSet(":note:") if you want your comments to start with :note:, for example.

Doing so allowed me to define precedence for #OR# operator just fine.

yhirose commented 3 years ago

@williamsharkey, thanks for the feedback. The '#' comment format is actually from the original Bryan Ford's original paper. Here is the excerpt of the PEG grammar on the 2nd page of the paper:

Spacing <- (Space / Comment)*
Comment <- ’#’ (!EndOfLine .)* EndOfLine
Space <- ’ ’ / ’\t’ / EndOfLine
EndOfLine <- ’\r\n’ / ’\n’ / ’\r’
EndOfFile <- !.

Of course it's OK to modify the PEG grammar on your own projects, but I wonder if it's a correct way to solve this issue because it breaks the original PEG grammar...

Instead of touching the comment format, I tried to allow literal operators in the parsing infix expression format in cpp-httplib (not go-peg, sorry...), and here is the commit. Users can now specify operators like #plus# and #multiply#.

Here is the actual cpp-peglib's example:

START            <-  _ EXPRESSION
EXPRESSION       <-  ATOM (OPERATOR ATOM)* {
                       precedence
                         L '#plus#' -     # weaker
                         L '#multiply#' / # stronger
                     }
ATOM             <-  NUMBER / T('(') EXPRESSION T(')')
OPERATOR         <-  T('#plus#' / '#multiply#' / [-/])
NUMBER           <-  T('-'? [0-9]+)
~_               <-  [ \t]*
T(S)             <-  < S > _

As you can see, we can still use both '#' comment and operators with '#'. I don't have enough time to implement the similar in go-peg though, I hope it will be helpful for you.