ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5.01k stars 217 forks source link

Lookahead/lookbehind not workinkg properly in grammar expression #384

Closed thiago-silva-tech closed 2 years ago

thiago-silva-tech commented 2 years ago

I'm using Ohm.js to build a parser, and I'm trying to transform this regex:

/\^CF[A-Z0-9]?\,?((?<=([A-Z0-9]?\,))\d{1,5})?\,?((?<=([A-Z0-9]?\,\d{0,5}\,))\d{1,5})?/g

into the following rule:

rule = "^CF" hexDigit? ","? (&(hexDigit? ",") digit+)? ","? (&(hexDigit? "," digit* ",") digit+)?

I'm trying to build a rule that respect the order of the parameters separated by commas, and for that I'm using the lookbehind assertion. The regex works properly, but rule in grammar is not working.

I get the following error: image

Aparently the generated parser expected the lookbehind object (the comma) to be in the target string.

Not sure if I was clear in my explanation. If you put the rule in the online editor it's easy to test the cenario.

MuhammedZakir commented 2 years ago

Can you post some example strings for testing? Pereferably, strings that should and shouldn't match.

thiago-silva-tech commented 2 years ago

Sure, thanks for your support !

I actually found an alternative solution, but the initial idea was to create a rule that starts with a literal and then could have three optional parameters, the parameters could have different types and have to be in the correct position in the parameters sequence, so let's imagine the following rule expression:

  1. Starts with the literal "^CF"
  2. Then can have a first parameter "A"
  3. Then can have a second parameter "B"
  4. Then can have a second parameter "C"

Valid inputs: ^CFA ^CFA,B ^CFA,,C ^CFA,B,C ^CF,,C ^CF,B

Invalid inputs: ^CFB ^CFC ^CF,C ^CF,,A ^CFB,A

What I was trying to do is to build a rule that only allow a parameter if it is in the correct position, so only allow "B" if "A" plus "," or only "," is in front of it.

I found a solution defining a "function rule" (I think that is not the name haha):

ThreeParameters<A,B,C> =
A ("," B| "," "")? ( "," C|"," "")? | A? ","B ( "," C|"," "")? | (A","|"," "") (B","|",""") C | "" ","? "" ","? ""`

...and when I want a rule that accept 3 parameters, I do:

rule = "^CF" ThreeParameters<"A","B","C">

thiago-silva-tech commented 2 years ago

I'll close the issue since I found a solution, but if anyone find a solution that use lookahead/lookbehind, please let me know.

I created this gist with onother of thoses "function rules".

MuhammedZakir commented 2 years ago

Below expression should work. I have tested this with the example strings you gave above.

"^CF" "A"? ("," ("B"? ("," "C")?)?)?