Add a pass to simplify single-character choices

markw65 commented 1 year ago

I have a modified version of examples/xml.peggy in my project, and was trying to optimize it for speed. I found a few places where it could be rewritten to avoid backtracking, getting about a 20% win, but with some readability cost.

Then I noticed that there are a number of long lists of alternate character classes, and they end up being implemented very inefficiently; effectively it tries one regex after another in sequence.

I manually combined all the character classes for BaseChar into a single character class, and got a nearly 2x speed up.

So I've added a pass that does that automatically. The result is a more than 3x speedup parsing the xml files that I work with (which seem fairly typical).

The pass takes a rule like chars: "a" / "b" / [d-r] and turns it into chars: [abd-r]

markw65 commented 1 year ago

I've put it under an option, mergeCharacterClasses because enabling it by default breaks a number of tests, which have specific expectations for error messages:

    Expected value to strictly be equal to:
      "Expected \"a\", \"b\", or \"c\" but \"d\" found."
    Received:
      "Expected [abc] but \"d\" found."

Since that test is specifically designed to ensure that an appropriate error message is output for choice nodes I didn't want to just change the expected output. But I'd be happy to change the grammar for such tests so that the choice nodes remain as choice nodes, and enable the optimization by default if that would be preferable.

markw65 commented 1 year ago

I just realized I need to update lib/peggy.js and docs/js/examples.js now that this is done unconditionally. Should I do that now, or wait until you're happy with the state of the commit?

hildjj commented 1 year ago

This looks good to me, I think. @Mingun any other comments?

hildjj commented 1 year ago

Except one needed addition: a CHANGELOG entry, please.

markw65 commented 1 year ago

Except one needed addition: a CHANGELOG entry, please

I put it in "minor" updates - let me know if if should be somewhere else...

hildjj commented 1 year ago

Sorry for moving the goalposts, but once I thought of coverage on #427, there are a few lines that need to be hit on this one in lib/compiler/passes/merge-character-classes.js, unless they are particularly hard to reach.

markw65 commented 1 year ago

I'm struggling with coverage. I've made a couple of changes to my new tests to hit the uncovered lines, and when I run as

npx jest -t mergeCharacterClasses

It shows that everything (in my file) is covered.

But when I run npm test or npm run coverage and run all the tests, it shows two lines as not covered... how does that happen?

[edit] Also, when I set breakpoints on the two supposedly uncovered lines, they both hit...

hildjj commented 1 year ago

I'm going off what I see in the coverage/lcov-report directory after having run npm run build. That dir can sometimes be a little confusing because it doesn't get cleaned up every time, and the URLs change if you rerun with a subset of the tests. Try deleting the directory, then navigating to the correct file from index.html when you run a different set of tests.

Here are the three lines that I currently see uncovered in the full test suite: 22, 67, 73.

I don't see breakpoints hit on any of those, but checked to make sure that I have everything set up right by having breakpoints fire on other lines in your file.

hildjj commented 1 year ago

Also, I'm on the Discord channel if you want to try and figure this out in realtime.

markw65 commented 1 year ago

I rewrote things a little:

some things that asClass was checking for were unnecessary because merge would have already handled them
preserve locations when merging nodes
only call cleanParts after merging all the alternates for efficiency

And then I updated the tests so that everything is covered.

markw65 commented 1 year ago

I think I addressed everything in the last update. Let me know if there's anything else to do here.

hildjj commented 1 year ago

@Mingun, one last thumbs-up, please?

hildjj commented 1 year ago

@Mingun this one needs a final thumbs-up as well, please.

markw65 commented 1 year ago

Anything else to do here?

markw65 commented 1 year ago

@hildjj @Mingun ping again?

peggyjs / peggy

Add a pass to simplify single-character choices #425