peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
884 stars 63 forks source link

Error parsing strings with curly braces in JavaScript blocks #238

Open jcubic opened 2 years ago

jcubic commented 2 years ago

This is example that break the parser:

str.replace(/\$\{/g, '\\${');

it throws:

Expected code block but "\n" found.

In my language playground.

As a workaround I've used escaped value:

str.replace(/\$\x7b/g, '\\$\x7b');

Curly braces in strings and regex confuse the parser. I've tried to use this on a Website:

{
  function foo(x) {
     return x.replace(/\}/g, '}');
  }
}

and it throws random errors.

Mingun commented 2 years ago

Yes, this is a known documented limitation, which will be eliminated in the future, when API for replacing part of grammar for parsing action code will be developed (because the goal of some plugins for pegjs/peggy is to replace javascript with another language).

jcubic commented 2 years ago

That's interesting. Do you plan to support one particular language or a bunch of them?

Peggy has a nice syntax it would be nice to be able to use similar syntax for other languages. Do you have anything ready?

BTW: That's funny:

{
  function foo(x) {
     // { {
     return x.replace(/\}/g, '}');
  }
}
start = "x"

this is valid!

Mingun commented 2 years ago

I plan to make an API that will allow plugins replace source code parsing part of the peggy grammar, so the plugins for other languages can implement a minimal subset to correctly parse braces for their languages. Peggy itself will contain only JS parser subset.

When this will be implemented, I'll make issues/PRs to known plugins.

hildjj commented 2 years ago

Since this is documented, can we close this?

Mingun commented 2 years ago

I think, we can leave this open until proposed solution (pluggable CodeBlock parsers) will be implemented. This, however, requires some work to design a way for composing grammars, which is also required for import feature. Of course, we can implement a special mechanism just to support this case, but I think it will be better to use a generic solution

gamesaucer commented 1 year ago

To summarise what I said in the other issue: Accounting for mismatched braces in strings (including template literals) and comments doesn't require the parser to know JavaScript, or even for the JavaScript to be valid. Recognising those inside code snippets is pretty simple and can be used to effectively escape braces.

The one thing that poses a problem is regex literals since / is also the division operator, and the rules for when something is division and when it's a regular expression can be complex. The most annoying edge case is determining whether a preceding pair of braces { ... } is an object or a code block, but if we say that dividing anything other than a number, an identifier or ) is user error, it becomes a lot simpler.

I haven't checked the performance of such a solution, but I expect the impact to be negligible compared to implementing a full ES parser.