kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

Matching chars except if they are escaped #579

Closed FranciscoG closed 3 years ago

FranciscoG commented 3 years ago

Given one string like this

const input = "test#\~string~example2#123~%10%moretext\~#feedback"

// broken apart
"test#\~string" "~" "example2#123" "~"  "%10%moretext\~#feedback"

I would like the parser to match groups of text that are separated by a ~ but ignore escaped tildes \~

Here's what I want my result to look like

// assume text was joined
parser.result = [
  [ 
    "test#\~string",
    "example2#123",
    "%10%moretext\~#feedback",
  ]
]

I've tried doing this with a regex using a negative lookbehind but that is not 100% supported yet in Safari and I need to support Safari unfortunately so I am hoping to figure out a way to do this using Nearley's grammar

FranciscoG commented 3 years ago

think i figured it out

Grammar

Main -> answers {% id %}

answers -> answerText ("~" answerText {% d => d[1] %}):* {% flatten %}

answerText 
  -> answerChar {% id %}
  | answerChar answerText {% d => d[0] + d[1] %}

answerChar 
  -> [^\~\\] {% id %}
  | "\\" escape {% d => d[0] + d[1] %}

escape 
  -> "~"
  | "\\" {% () => "\\" %}
  | "#" {% () => "#" %}

@{%

  const flatten = d => {
    return d.reduce(
      (a, b) => {
        return a.concat(b);
      },
      []
    );
  };

%}

Input:

test\~next~blablabla~last\~test

Output

Testing it out in this playground I am seeing this

["test\~next", "blablabla", "last\~test"]

This video really helped me out: https://www.youtube.com/watch?v=a2mZTBI1ZxU&t=620s

I need to run it through a few more test cases but I'm pretty this is it.