peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
935 stars 64 forks source link

WASM? #222

Open ghost opened 2 years ago

ghost commented 2 years ago

Parsers and WebAssembly go hand in hand. Kinda surprised this hasn't been suggested already!

Mingun commented 2 years ago

If you want to create a WASM module for your parser definitions, probably it is better to write parser on Rust using rust-peg (this project obviously was inspired by pegjs, if you look at the project internals and it's grammar language is still very close to the language used by pegjs/peggy). Rust can be compiled to wasm module directly.

hildjj commented 2 years ago

Mingun's answer is good for today, but there's no reason why can't generate WASM directly from peggy one day. It could be done as a plugin to keep the runtime dependencies low for the core project (like https://github.com/metadevpro/ts-pegjs)

ghost commented 2 years ago

Like, are you suggesting making a plugin that generates a Rust parser then compiles to WASM?

ghost commented 2 years ago

Also what if we figured out a way to use the ANSI C plugin? Would there be a performance benefit?

hildjj commented 2 years ago

I was thinking the plugin could generate WASM directly, then either call back into JS for the actions, or use something like AssemblyScript to compile the actions themselves.

hildjj commented 2 years ago

For generation, something like binaryen.js probably gets us there with only a lot of work involved.

ghost commented 2 years ago

I'm trying to run the Textmate scope selector parser in https://github.com/microsoft/vscode-textmate/issues/52#issuecomment-985356819 but the performance is very bad on lower-end CPUs because of the parser's recursion. 😔 Looks like I will have to start cramming Rust and researching rust-peg over in the weekend.

hildjj commented 2 years ago

Can you check to see if this makes things any better?

segment
    = _ segment:([a-zA-Z0-9+_][a-zA-Z0-9-+_]*) _ {
        return new matchers.SegmentMatcher(segment);
    }

(just removes the + from the first [] grouping, which shouldn't change your semantics)

ghost commented 2 years ago

Wait, trying it again now..

ghost commented 2 years ago

Broke the parser when I tried to feed it a wildcard

ghost commented 2 years ago

I cached the generated matcher functions AND their output and the performance improved :)

hildjj commented 2 years ago

Leaving this open as the feature request for WASM support.