Avoid double extraction of substrings in various MATCH_ bytecodes

peggyjs / peggy

Peggy: Parser generator for JavaScript

https://peggyjs.org/

MIT License

914 stars 65 forks source link

Avoid double extraction of substrings in various MATCH_ bytecodes #427

Closed markw65 closed 1 year ago

markw65 commented 1 year ago

MATCH_* opcodes typically do something like

if (<test>(input.substr(peg$currPos, length))) {
  sN = input.substr(peg$currPos, length);
  ...
} else {
  sN = peg$FAILED;
  ...
}

where we extract the same substring twice.

compileInputChunkCondition will convert that to

sN = input.substr(peg$currPos, length);
if (<test>(sN)) {
  ...
} else {
  sN = peg$FAILED;
  ...
}

avoiding the overhead

markw65 commented 1 year ago

When stacked on #425 this makes the parser generated from examples/xml.peggy an additional 5% faster.

hildjj commented 1 year ago

Did you check coverage over the changed bits of code with just the one test added?

I need to add codecov.io to the CI action.

hildjj commented 1 year ago

Looks like line 584 might need a test.

markw65 commented 1 year ago

Looks like line 584 might need a test

I think thats probably true - the way bytecode is currently generated, I don't see a way to get an ACCEPT_N with a length greater than 1 which doesn't also hit the skipAccept condition.

The only way I can think of to hit that line would be to add an option to disable the transformation, and run a test with it disabled...

But how are you seeing that? npm test coverage doesn't mention that line for me...

hildjj commented 1 year ago

I downloaded the PR with gh pr checkout 427, ran npm build, and opened coverage/lcov-report/index.html in a browser.

markw65 commented 1 year ago

Ok, so yes, that line seems to be uncovered; and I've confirmed that the current byte code generator never generates a sequence that can hit that line.

In fact, the only thing that can hit the single character branch on the next line is MATCH_ANY.

So I've added a test to plugin-api.spec.js which inserts a munge-the-bytecode pass between generateBytecode and generateJS. It changes the bytecode so that compileInputChunkCondition doesn't see its expected pattern, and as a result we get full coverage (the test also verifies that the compileInputChunkCondition didn't fire by examining the generated parser).

markw65 commented 1 year ago

I think this is ready to go.

markw65 commented 1 year ago

There's a new coverage issue, sorry. I'll fix and update.

markw65 commented 1 year ago

Anything else to do here?

markw65 commented 1 year ago

I've rebased this over #425 (thanks!) resolved the CHANGELOG.md "conflict", and rebuilt the build artifacts.

hildjj commented 1 year ago

Great, thanks. @Mingun final checks, please?