Benchmarks: faster hand-built lexer comparison

no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

BSD 3-Clause "New" or "Revised" License

821 stars 65 forks source link

Benchmarks: faster hand-built lexer comparison #42

Closed bd82 closed 7 years ago

bd82 commented 7 years ago

I believe It may be possible to get better performance from the hand built lexer by:

Avoiding string comparison and using charCodeAt instead.
Avoid string concatenation one character at a time and using source.subString(startIndex, index) instead.

I will hopefully have time to look at this later this week. And report my findings.

tjvr commented 7 years ago

Probably!

Avoid string concatenation one character at a time and using source.subString(startIndex, index) instead.

Right; if only because the current code is decoding the backslash/unicode escapes, rather than just counting the length of the string (as it should be doing). :-)

bd82 commented 7 years ago

Finally got around to testing this.

I did a mostly automatic transformation of the hand built-lexer to use charCodeAt https://github.com/bd82/moo/commit/a1841136878c13af1d5d7601be89f1ef40af5724

It turned out to be slightly slower (using node.js V6). Maybe I'm doing something incorrectly and a manual refactoring is needed, or maybe V8 already optimizes this automatically so its redundant.

tjvr commented 7 years ago

In tests I've found indexing strings to be about as fast as charCodeAt, so that makes sense. Thanks for checking :-)