krausest / js-framework-benchmark

A comparison of the performance of a few popular javascript frameworks
https://krausest.github.io/js-framework-benchmark/
Apache License 2.0
6.52k stars 811 forks source link

Make vanillajs score fastest on chrome 124 (again) #1661

Closed krausest closed 1 week ago

krausest commented 2 weeks ago

With chrome 124 the vanillajs and vanillajs-1 implementation don't score fastest. Currently it's not clear whether actually some change favors other frameworks or it's a bug with chrome 124 traces. Here's the current result:

I'd believe chrome 124 results if we could create a new vanillajs implementation that becomes the fastest implementation again. The first goal would be an implementation that runs fastest in chrome 124 for create 1,000 rows, replace all rows and create 10,000 rows. It would be really appreciated to get vanillajs PRs. I promise to run them in a timely fashion. Thanks.

antonmak1 commented 2 weeks ago

1633 Here is written about an assumption to increase the speed of work. I think that after the vanilla-2 js version I will need to close this issue.

antonmak1 commented 2 weeks ago

I still don't understand the meaning of e.preventDefault() for each button. It seems to me that stopPropagation() will be much more effective in this sense (At least I checked it, so I think so).

krausest commented 2 weeks ago

Thanks. I think I applied the suggestions for vanillajs-2:

Ivi is still faster for create 1k rows, it's a tie for replace rows and vanillajs-2 is faster for create 10k rows.

krausest commented 2 weeks ago
syduki commented 2 weeks ago

You can try some more things:

  1. Inline appendRows function. The reason behind this is that ivi is doing a lot of inlining so theoretically it could get some optimization from that.
  2. Append all rows in one batch by building the tbody beforehand (don't blame me, the jar was already open 😅)

Everything put together:

// top level inline function
const appendInline = (function () {
    var rows = this.rows, s_data = this.store.data, data = this.data, tbody = this.tbody;
    const empty = !tbody.firstChild;
    empty && tbody.remove();
    for(let i=rows.length;i<s_data.length; i++) {
        let tr = this.createRow(s_data[i]);
        rows[i] = tr;
        data[i] = s_data[i];
        tbody.appendChild(tr);
    }
    empty && this.table.insertBefore(tbody, null);
});
    appendRows() {
        appendInline.call(this);
    }
krausest commented 2 weeks ago

Here's vanillajs with appendInline and the other optimizations applied: Screenshot 2024-05-03 at 22 25 03 Good improvement for create 10k rows. Sadly create 1k rows is stuck and ivi is still ahed.

mksunny1 commented 2 weeks ago

Sorry, I have had my hands full the whole time. My implementation in Eventiveness uses the StringBuilder pattern from Java to greatly improve the performance for adding rows, especially for the 10K rows. It's like this:

  1. Compose an array containing the outerHTML of all the rows. I would have preferred a generator here if there was a standalone join function.
  2. Join the array at once. This is faster than building up the string incrementally.
  3. Create a fragment from the joined string
  4. Append the fragment to the tbody.

It is hard to imagine another implementation should be faster than this. Most of the heavy lifting is left to the engine. I will add that faster for 10K rows is always better than for 1k rows. It suggests better scalability.

mksunny1 commented 2 weeks ago

it is less performant to store the element template in a 'global' string than just inlining it directly into the 'for' loop used to generate the item markups. I reworked my 'framework' to perform similarly to the inline case.

mksunny1 commented 2 weeks ago

Also please don't use multiple calls to push when composing the array. Use a generator and destructure into an array with [...myGenerator]. I forgot this detail earlier.

localvoid commented 2 weeks ago

Another difference that comes to my mind is that ivi uses direct calls to DOM methods to avoid megamorphic callsites [1][2] , but vanillajs implementation doesn't have megamorphic callsites, so switching to direct calls should make vanilla slower.

Also, ivi doesn't use any type of event delegation and there are 2 addEventListener() calls per row.

  1. https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L41-L80
  2. https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L589-L593
krausest commented 2 weeks ago

Thanks so far. I think I can show my concerns with chrome 124 in the following two charts: This is the duration for rendering. Chrome 124 reports about 2.5 msecs faster rendering duration (and this can indeed be validated in the traces).

That wasn't the case with chrome 123:

Why is chrome 124 rendering for ivi faster than the others? (Or as for my concerns: Why is chrome reporting faster rendering durations for ivi? I'm still not convinced that those are real.)

The difference in script duration isn't what makes ivi fastest:

krausest commented 2 weeks ago

One example trace for create 1k rows for ivi: Screenshot 2024-05-04 at 10 28 13 Total duration: 34.91 msecs Script duraton: 3.18 msecs Rendering: 30 msesc (9.71 msecs recalculate style, 16.87 msecs layout, 3.22 msecs Pre-Paint, 1.11 msecs Paint)

One example trace for create 1k for vanillajs: Screenshot 2024-05-04 at 10 28 36 Total duration: 38.29 msecs Script duraton: 3.08 msecs Rendering: 34 msesc (10.03 msecs recalculate style, 19.66 msecs layout, 3.45 msecs Pre-Paint, 1.04 msecs Paint)

@localvoid Any ideas why chrome would perform layout faster for ivi?

localvoid commented 2 weeks ago

Trying to find any differences in the DOM output. Maybe because <a> elements has classes lbl and remove, but there aren't any css rules for this classes, so it is highly unlikely. https://github.com/krausest/js-framework-benchmark/blob/f3e82b3ed4cb47edfab85d072adc81f274318239/frameworks/keyed/vanillajs-2/src/Main.js#L9

krausest commented 1 week ago

@localvoid Good catch, but it's not causing that effect. This is create 1k rows for ivi with the lbl and remove classes (I'll commit that too) and vanillajs without:

Overall a bad run, but ivi still renders 2.5 msecs faster (says chrome 124)

localvoid commented 1 week ago

I'll commit that too

I think that this classes (lbl, remove) are used in vanilla because of event delegation. Originally, all implementations like React, etc didn't have this classes: https://github.com/krausest/js-framework-benchmark/blob/970b34f7704b6dbd496533fa25314c694cb3d05f/frameworks/keyed/react-classes/src/main.jsx#L57-L68

antonmak1 commented 1 week ago

@krausest The bubbling process of selectRow and deleteRow can be stopped on tr by stopPropagation. That is, do not do stopPropagation immediately on a click, but do it only after reaching the 'parentNode of the function' on tr. I don’t know how to do it in vanilla yet. This will speed up the work.

antonmak1 commented 1 week ago

@krausest I don't know, you can just take tr and set the cancelBubble property to true. But this is not done this way - you need to use stopPropagation somehow.

krausest commented 1 week ago

@localvoid You're right. I reverted ivi and removed the classes from the vanillajs* implementations. The result (total duration) still looks like that:

Now what's interesting is the results for create 1k rows without warmup (i.e. initBenchmark just checking the existence of #run, no create / clear cycle):

And even more interesting is the rendering duration without warmup:

A flatline, which really makes sense (here...).

mksunny1 commented 1 week ago

@krausest I like this last chart. I think you are looking for a more consistent measurement scheme, that can weed out the noise and measure the most important things about how the frameworks build DOMs. Will a vanillajs implementation that scores faster now still suffice for the long run?

mksunny1 commented 1 week ago

It could be worth it to make the measurements more fine-grained here. Some aspects may then be more susceptible to browser implementation changes than others. Many will remain consistent, especially over a short period.

robbiespeed commented 1 week ago

@krausest the delta in the number of nodes in the trace you showed above don't seem to match. Vanilla is 487 starting and 11488 total (11001 new nodes) vs ivi with 40102 starting and 50103 total (10001 new nodes). That's 1000 less new nodes for ivi, there's also the issue of why ivi was starting with so many nodes. Maybe GC didn't collect the nodes from the warm-up, not sure if that would have an impact, but it might.

krausest commented 1 week ago

@robbiespeed good catch, indeed ivi stands out that the dom nodes created by initBenchmark are still reported by chrome (in contrast to vanillajs, doohtml and solid). @localvoid is that intentional?

robbiespeed commented 1 week ago

May have figured out the reason for the different node deltas. There's a bit of a hint if you look at the graph somewhere during layout for VanillaJS chrome will create 1k new nodes for the ::before pseudo element of each row for the remove glyph (removing the glyph classes will make the node count a consistent 10 nodes per row). My guess chrome skips creating new nodes for the pseudo elements if there were old ones still around to "recycle".

It would be interesting to see how the benchmark fairs when removing the css that creates the pseudo elements.

localvoid commented 1 week ago

@krausest No, I don't have a clue why this is happening. ivi doesn't have any type of DOM or js object recycling.

robbiespeed commented 1 week ago

Ran vanillajs against ivi with and without the glyphs. Removing the glyphs didn't help close the perf gap.

krausest commented 1 week ago

@robbiespeed Maybe we're really on to something Seems like forceGC() which called window.gc() a few times didn't clean the dom nodes for ivi. If we run window.gc() and HeapProfiler.collectGarbage" the node count goes down: Screenshot 2024-05-07 at 20 58 40

And even more, the performance is closer to what we'd expect:

And performance doesn't change wildly when a 500 msec delay is added:

Now I'll run the benchmarks above in a rather clean run with the modified forceGC call. I'll report back.

krausest commented 1 week ago

Here's the result with the modified forceGC method:

Looks much more reasonable to me:

krausest commented 1 week ago

With 1.000 msecs delay the graph looks pretty similar, which is really good news!

krausest commented 1 week ago

Conclusion:

@paulirish Whenever it comes to GC we need your help. Running windows.gc() in a loop was your recommendation the last time. Is this still what you'd recommend? Any idea why HeapProfiler.collectGarbage collects more nodes for ivi than windows.gc() alone?

Update: Just checked two things:

Chrome 123 with ivi and 7x window.gc. Node count drops after the gc calls from ~30k to 66. The next create rows call adds ~10k nodes. Screenshot 2024-05-07 at 23 10 43

Chrome 124 with ivi and 14x window.gc. Node count stays at 30k nodes and rises to 40k rows when create rows is performed: Screenshot 2024-05-07 at 23 13 03

Chrome 124 with 7x window.gc and HeapProfiler.collectGarbage. Node count starts with 66 (!) and rises to 10k rows: Screenshot 2024-05-07 at 23 16 00 What's still odd here is that the trace starts with 66 nodes and not with something like 30k to 50k nodes from initBenchmark. I checked all 15 traces for one benchmark run and every trace started with 66 nodes 🤯 . I don't understand that, but 10k at the end makes sense.

robbiespeed commented 1 week ago

@krausest try window.gc({type:'major',execution:'sync',flavor:'last-resort'}).

This is what the results I get when doing 7 gc with default params ({type:'major',execution:'sync'}): Screenshot from 2024-05-07 22-21-54

vs {type:'major',execution:'sync',flavor:'last-resort'}: Screenshot from 2024-05-07 22-19-38 notice vanilla and ivi swapped position and colour

I'm not entirely sure of the differences of last-resort GC, but it seems to be a more thorough GC pass meant for when the process has reached it's memory limits. I suspect HeapProfiler.collectGarbage is doing a last-resort GC internally.

gc options are documented in v8 source

robbiespeed commented 1 week ago

Also tried only 1 "last-resort" GC and it appears to be sufficient without multiple GC calls from JS. Judging by the code path for last-resort GC it seems to do multiple passes internally anyway, with a max of 7 passes. Code for deciding type of GC here, code for the last resort GC here.

krausest commented 1 week ago

@robbiespeed I'll add that for chrome 125. I just finished chrome 124 for keyed results with window.gc and HeapProfiler.collectGarbage

Issue resolved: We made vanillajs fastest again by slowing down ivi 😄 . (Though not entirely true: We also tweaked vanillajs a bit)