quickjs-ng / quickjs

QuickJS, the Next Generation: a mighty JavaScript engine
MIT License
930 stars 77 forks source link

Replace malloc with mimalloc #142

Open ammarahm-ed opened 9 months ago

ammarahm-ed commented 9 months ago

Using mimalloc instead of malloc improves performance significantly.

Pros: Improved performance with a small change

Cons: It will add an external dependency and slightly increase size of QuickJS binaries.

It has been previously added here: https://github.com/openwebf/quickjs/pull/30

Here's the benchmark I ran after replacing:

Benchmark Master Mimalloc Percent Improvement
DeltaBlue 605 702 15.99%
Crypto 863 906 4.98%
RayTrace 357 963 169.33%
EarleyBoyer 682 1579 131.70%
RegExp 149 220 47.65%
Splay 736 1804 145.76%
SplayLatency 3088 4543 47.16%
NavierStokes 1511 1590 5.24%
PdfJS 2016 2949 46.21%
Mandreel 604 656 8.61%
MandreelLatency 4803 5097 6.13%
Gameboy 5191 5644 8.72%
CodeLoad 8015 13343 66.20%
Box2D 1905 2632 38.03%
Typescript 7088 9413 32.75%
Score (version 9) 1430 2060 43.98%

It could also be opt-in by the way.

saghul commented 9 months ago

We should do new benchmarks when considering something like this, FWIW.

bnoordhuis commented 9 months ago

Using a custom allocator is problematic for libquickjs.a due to:

Embedders can replace the default allocator. That's something we could do in the qjs executable, although I'm in no rush to do that right now.

Tangentially, I've spotted several places where quickjs mallocs a struct that mallocs a struct that mallocs a struct. There's likely gains to be had by flattening such allocation trees.

ammarahm-ed commented 9 months ago

Using a custom allocator is problematic for libquickjs.a due to:

* risk of symbol conflicts in the embedding program unless the allocator is completely namespaced off

* allocator mixup bugs (allocating with allocator A, releasing with B); maybe not a problem for quickjs but js_malloc and friends _are_ exposed in quickjs.h

Embedders can replace the default allocator. That's something we could do in the qjs executable, although I'm in no rush to do that right now.

Tangentially, I've spotted several places where quickjs mallocs a struct that mallocs a struct that mallocs a struct. There's likely gains to be had by flattening such allocation trees.

  1. That is understandable I think, but also solvable if there are some benefits of using a different allocator. I think mimalloc is already namespaced as mi_malloc, mi_free etc.
  2. If someone uses js_malloc even now, I expect them to free with js_free and internally these apis will switch to mimalloc so if someone does end up allocating memory with malloc and then trying to free with js_free, it's on them I would say haha.

There are indeed some optimisations we could do with the default allocators too but i think the difference is just huge when using mimalloc.

ammarahm-ed commented 9 months ago

@saghul So I did some benchmarks with current quickjs-ng

Benchmark Master Mimalloc Percent Improvement
DeltaBlue 605 702 15.99%
Crypto 863 906 4.98%
RayTrace 357 963 169.33%
EarleyBoyer 682 1579 131.70%
RegExp 149 220 47.65%
Splay 736 1804 145.76%
SplayLatency 3088 4543 47.16%
NavierStokes 1511 1590 5.24%
PdfJS 2016 2949 46.21%
Mandreel 604 656 8.61%
MandreelLatency 4803 5097 6.13%
Gameboy 5191 5644 8.72%
CodeLoad 8015 13343 66.20%
Box2D 1905 2632 38.03%
Typescript 7088 9413 32.75%
Score (version 9) 1430 2060 43.98%
thecodrr commented 9 months ago

That's a very significant difference. Another allocator worth trying out is jemalloc.

RobLoach commented 9 months ago

Perhaps as an optional dependency? Or an allocator system that lets you add a define to switch which malloc you're using?

bnoordhuis commented 9 months ago

Already the case:

Embedders can replace the default allocator.

ammarahm-ed commented 9 months ago

Even though you can replace the default allocator yourself which i have done already, making it default positions quickjs at a much better place in terms of performance for everyone and in various benchmarks. That's why i opened up this issue.

saghul commented 9 months ago

While having good benchmark results it would be disingenuous to do that if it turns out it's tricky to use that allocator in all situations.

That said, once I put the docs together, we can have a section where we mention things like this.

saghul commented 1 month ago

@ammarahm-ed I want to benchmark something semi-related to this (using calloc vs malloc + memset to be precise). How did you get the results table from your first post?