lys-lang / node-ebnf

Create AST PEG Parsers from formal grammars in JavaScript
https://menduz.com/ebnf-highlighter/
MIT License
104 stars 9 forks source link

Parser re-escapes `\` in text for raw strings. #41

Open dmfxyz opened 2 years ago

dmfxyz commented 2 years ago

Take this simple grammer:

grammar = `
str          ::= '"' (unsafe | SAFE)* '"'
SAFE         ::= #x21 | [#x24-#x5A] | [#x5E-#x7A] | #x7C | #x7E
unsafe       ::= ESCAPE #x22
ESCAPE       ::= #x5C
`

If we define a raw string as follows:

str = String.raw`"stringwith\"escapes"`
console.log(str)

We get the representation:

"stringwith\"escapes"

Now if we define rules and a parser for this grammar and run it on that raw string:

rules = ebnf.Grammars.W3C.getRules(grammar)
parser = new ebnf.Parser(rules)
ast = parser.getAST(str)
console.log(ast)

We see the ast:

<ref *1> {
  type: 'str',
  text: '"stringwith\\"escapes"',
  children: [
    {
      type: 'unsafe',
      text: '\\"',
      children: [],
      end: 13,
      errors: [],
      fullText: '',
      parent: [Circular *1],
      start: 11,
      rest: ''
    }
  ],
  end: 21,
  errors: [],
  fullText: '',
  parent: null,
  start: 0,
  rest: ''
}

The text property of both the parent str and child unsafe have had the \ re-escaped. I don't think this re-escapement should happen for raw strings.

You can see a basic example here: https://github.com/dmfxyz/node-ebnf-issue-example And the repo in which we originally discovered this behavior: https://github.com/nmushegian/jams/pull/23