Open richarddd opened 2 weeks ago
I've been waiting for someone to open this issue :-)
So, I've been thinking about this a lot obviously, and I have several ideas how to tackle it. Let me start off with the observation that template JITs eliminate interpreter dispatch overhead but not much else.
quickjs has "fat" opcodes - meaning most opcodes do a lot of work - and that helps keep dispatch overhead down. It's usually within 5-25%. That's not nothing but it means a dumb JIT isn't going to move the needle much.
My quickjit experiment is basically a template JIT because tcc is;1 it's somewhere between a little slower to maybe 50% faster than the bytecode interpreter2. I consider it a dead end.
There are three prongs of attack that I'm hopeful will give a significant boost:
Leaning into inline caches and type feedback way more than we do now. Something like r * Math.sin(d)
should ideally get lowered to a single type-guarded opcode.
Eliminate VM stack shuffling as much as possible, maybe by switching to a register VM. A decent JIT needs to deal with register allocation anyway so we might as well do the work upfront in the interpreter.
Be smarter about managing memory and reference counts. In some benchmarks quickjs spends an extreme amount of time adjusting objects refcounts up and down, often to end up with the exact same reference count it started out with. Smarter analysis (like deferring refcounting to the end of basic blocks, or even better, until it's observable) should help a great deal.
Once all that is in place, I'm confident a more-than-decent method JIT or tracing JIT falls out almost naturally.
Of course that all takes a lot of time to implement and we're working on this in our spare time so no ETA.
1 tcc is like the MVP of compilers. Fancy register allocation, instruction selection, code motion, constant propagation, loop unrolling, &c? tcc doesn't do any of it, it just translates C input to ASM output in the most straightforward way possible. The quality of its generated code would get you a D- in Compilers 301 ;-)
2 I wrote another proof of concept (not open source) where quickjit shells out to clang, then dlopens the result. It's around 2-4x faster due to clang's massively better optimizer but has several CPU/memory drawbacks (clang is resource hungry) and it's still not remotely in the same ballpark as the big JS engines.
Hello team,
This is a substantial proposal, and I recognize that @bnoordhuis is already exploring similar optimizations with QuickJS. However, I believe it may be valuable to consider implementing JIT-optimized fast paths for "simple" operations—such as array length checks, equality comparisons, and other common cases.
By using a templated JIT approach that directly translates bytecode to machine code, we could avoid introducing additional dependencies. Initially, we could limit the implementation to x86 and ARM64 architectures.
While Ben's approach of converting bytecode to C and using a complete compiler (TCC) achieves high performance, it introduces some compilation overhead and additional indirection. In contrast, a templated JIT might offer a leaner path to optimized execution for frequently encountered operations.
For inspiration, Andreas Kling’s recent implementation of a JIT compiler for LibJS in SerenityOS is a great example. You can see his process in this YouTube playlist:
Looking forward to your thoughts