Precompilation? - Githubissues

no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

BSD 3-Clause "New" or "Revised" License

817 stars 65 forks source link

Precompilation? #130

Open nebrelbug opened 4 years ago

nebrelbug commented 4 years ago

I'd love to use Moo for a library I'm building, but it needs to be extremely lightweight (even more than 4KB minzipped).

Is there a possibility of adding an option to precompile, a step which would basically moo.compile a grammar and then output a generated, more lightweight JS file that could be tweaked and customized?

tjvr commented 4 years ago

It's not something we've thought about.

I haven't checked in detail, but I would guess about half of Moo's source code deals with compilation. The other half is used at runtime. Would 1–2K be small enough? (Of course, this is in addition to the tokenizer RegExp itself.)

_{Sent with GitHawk}

nebrelbug commented 4 years ago

Yes, I think it would. I was also thinking that once I generated a new JS file, I could tweak some things by hand, like removing features I don't need.

bd82 commented 4 years ago

Would Tree-Shaking assist here?

https://github.com/rollup/rollup#tree-shaking

nebrelbug commented 4 years ago

@bd82 I don't think so, because I want to find a way to have some things (like building the RegExp) happen before runtime.

It would be great if there was some way to just save the lexer in a separate JS file after generation.

tjvr commented 4 years ago

Just for fun, here's a Gist which provides a silly (albeit working) approach to compiling a Moo lexer.

It's silly because it extracts the Lexer and LexerIterator class definitions from inside moo.js in a rather gross way. I doubt we'd ever consider merging this code 🙃 If we wanted to support this properly, we'd probably want to split up moo.js into two or three parts: in particular, you'd want the runtime structures (i.e. the Lexer class) to be separate from everything that builds the tokenizer, so that you can import just the runtime in your code.

Some stats:

moo.js is 17682 bytes, 4949 gzipped.
A tiny example tokenizer is 5981 bytes, 1817 gzipped.

nebrelbug commented 4 years ago

Thanks @tjvr, that's pretty nifty! I agree the code would probably not be clean enough to merge, but I really like the idea of separate runtime structures.

nebrelbug commented 4 years ago

Hi @tvjr! I'm considering using Moo to build a template engine, and wondered if you considered moving the runtime structure outside of the main moo.js?

nebrelbug commented 4 years ago

Also, I've recently been digging into the source code. Could you explain what fast is?

nathan commented 4 years ago

Could you explain what fast is?

It comes from https://github.com/no-context/moo/pull/40 / https://github.com/no-context/moo/pull/103. It makes single-character tokens significantly faster.

nebrelbug commented 4 years ago

@nathan thanks!

I seem to remember once running a benchmark that showed that str[0] actually ended up being faster than str.charCodeAt(0). Is there a reason why charCodeAt was chosen?

nebrelbug commented 4 years ago

@chocolateboy did your thumbs down mean you didn't approve of str.charCodeAt, or you didn't like my comment?

tjvr commented 4 years ago

Is there a reason why charCodeAt was chosen?

Benchmarks at the time showed that it was slightly faster. It's certainly possible that's changed.