peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
914 stars 65 forks source link

Bytecode optimizer #429

Closed markw65 closed 7 months ago

markw65 commented 1 year ago

This stack implements a framework for performing optimizations to the bytecode (file interp-state.js), and then implements a number of fairly simple transformations via a new pass, optimize-bytecode.js.

After the initial commits of interp-state.js and its tests, I've committed each optimization separately, with an "artifacts" diff to show how it affects lib/parser.js, to make it easy to review/discuss. I plan to remove all the artifacts diffs, and add a single "Build Artifacts" commit once the pull request settles down.

Most of what I've implemented so far is pretty local, and depends on recognizing simple patterns (like PUSH_NULL, POP). If this framework is accepted I plan to add some more global analysis, so that eg an unused PUSH_CURR_POS can be removed, even when the POP is hidden behind a lot of more complex code; or immediately dropping unused elements from a plucked sequence, rather than generating them all, then ignoring the ones that aren't plucked (obviously their side effects still have to be executed); or not bothering to build an array for a sequence wrapped in a text node.

Mingun commented 1 year ago

I only have one concern: the bytecode was introduced by David (creator of pegjs) to be an abstraction over piece of JS code, that can be tested easily and have simple semantic. This change turns the bytecode into a more complicated thing, which probably could be simplified by using a more low-level bytecode.

So probably we should create that low-level bytecode and generate it directly? And then use it to generate JS. Roughly speaking, that means we should just renumber our bytecode instructions. It seems to me that it will be more in the spirit of the library, didn't it?

hildjj commented 1 year ago

I definitely want to rework code generation at some point, because the current mechanism of generating source maps isn't really sustainable, particularly as we want to add other output formats.

markw65 commented 1 year ago

So definitely having a lower level bytecode would make it possible to optimize more. Some of the bytecodes just do too much currently. But changing the bytecode presumably breaks all existing plugins - so I was trying to avoid doing that.

If you don't think this is something you want to take, I can redo it as a plugin, and maintain it myself until the big bytecode rewrite, when it will hopefully become redundant...

markw65 commented 1 year ago

So probably we should create that low-level bytecode and generate it directly

I think you'd still want to keep code generation simple, and then optimize the bytecode. I mean, without changing the bytecode at all, the code generator could emit optimized bytecode similar to what I'm producing here. But I think it would be harder to do...

markw65 commented 1 year ago

So I've taken this, and some more work I did on top, and converted it to a plugin.

Since this depends on some non-exported functionality, and since I also need a few updates to generate-js.js to get the full benefit of the bytecode optimizations, I ended up just using rollup to create the plugin from my fork of peggy. Its published as @markw65/peggy-optimizer, and includes #425 and #427.

I still think this (plus the additional work Ive done) is worth taking; the generated code with the plugin is much cleaner - and much closer to what a person might write (although loops are still awkward).

hildjj commented 7 months ago

I'm going to close this in favor of the plugin for now. Might revisit after we look at code generation.