krausest / js-framework-benchmark

A comparison of the performance of a few popular javascript frameworks
https://krausest.github.io/js-framework-benchmark/
Apache License 2.0
6.58k stars 814 forks source link

Showing JS-only, Browser-only, and JS+Browser measurements #1233

Closed fabiospampinato closed 4 months ago

fabiospampinato commented 1 year ago

As I understand it tests like "create many rows" measure the entire change, not just how long it took for the javascript to execute, but also how long it took for style recalculations, layout recalculations, and painting.

If that's the case It'd be interesting to be able to see results under 3 different filters, browser-only measurements (to check if the implementation has abnormal layout recalculations for example), javascript-only measurements (to check how much work the framework is actually doing, which assuming everything else is normal is largely what actually matters in the benchmark), and combined view where everything is taken into account.

The more practical motivation is that I think I might know how to half the amount of javascript work that is needed in some unkeyed tests (maybe some keyed tests also, depending on the exact definition one is using), but if the number displayed is dwarfed by style/layout/paint calculations it will seem like not much, even though it's actually pretty significant.

leeoniya commented 1 year ago

i've been championing this for a while, along with reducing 16x slowdown to maybe 4x, and reducing the DOM size. so far there's been little enthusiasm.

even very heavy apps like Slack consistently stay well below 10k dom nodes, while this bench generates 88k for the "create 10k rows + append 1k" metric.

see https://github.com/krausest/js-framework-benchmark/issues/403#issuecomment-393156221, https://github.com/krausest/js-framework-benchmark/issues/403#issuecomment-394882997

fabiospampinato commented 1 year ago

Potentially if the purpose of the slowdown is to make some tests weigh more than they otherwise would on the average the slowdown could just be deleted and the result could be multiplied by some constant number, I guess it should achieve the same result, perhaps lowering the amount of time it takes to run the benchmark also? I'm not exactly sure how the slowdown is implemented in Chrome also, it may be more reliable just to multiply the normal result by some constant.

80k+ nodes is kind of absurd, though I think the point of that test is more see how the framework scales, and perhaps to magnify small problems that might exist at lower numbers but go unnoticed. We could probably just switch to 100 rows base case and 1000 rows "worst" case without losing too much information. Maybe that'd be good if it makes the benchmark significantly cheaper to run.

unmeimusu commented 1 year ago

hello, may I ask about Imba's framework benchmark test?, since Imba use compiler (to simplify the code syntax) to JS as an output same as some framework did like Svelte, please help me get new test result here, thanks.

krausest commented 1 year ago

I investigated measuring JS script duration only. The idea is to compute it as the delta of page.metrics().ScriptDurationfor puppeteer and the CDP command for Performance.getMetrics(https://github.com/krausest/js-framework-benchmark/blob/13f27779220c2577420da056c8ef9a1317405161/webdriver-ts/src/forkedBenchmarkRunnerPuppeteer.ts#L133) since I haven't found a way to extract this directly from the traces.

Here's an example for alpine and create rows. The total duration is 99 and the JS script duration 59 msecs. The test driver reports 104 msecs and 62.5 msecs respectively, which sounds plausible.

alpine

Same for voby is an example for a total duration of 43.15 and a JS script duration of 5 msecs. The test driver reports 41.4 msecs for the total duration and 4.9 msecs for the script duration, which also sounds good.

voby

Here's a comparison for some frameworks that seem plausible.

Bildschirmfoto 2023-05-05 um 9 17 55 PM

I did this for all frameworks during the chrome 113 run, but some results are just too good to believe and must be investigated:

Bildschirmfoto 2023-05-05 um 9 20 57 PM

(Values reported as 0.0 are reported as zero due to rounding, not by actually having a duration of 0)

krausest commented 1 year ago

For ember I'd actually expect ~28 msecs:

Bildschirmfoto 2023-05-05 um 9 29 34 PM

Hmmm. Even if I add a large sleep after run benchmark both puppeteer and playwright report something like ScriptDuration = 0.027602 before runBenchmark and 0.028274 after, which yields a duration of 0.672 mescs. The timestamps shows that values are about 1 sec apart (due to the wait). Not good.

leeoniya commented 1 year ago

great to see some progress!

i would also make sure to include any gc cost in this if it's not part of "script" already. i've seen it under "system" in chrome's profiler summary (especially forced gc at end of a run)

ClassicOldSong commented 1 year ago

One opinion: how the script manipulates DOM still counts. One framework takes much less time scripting but with much more duplicated DOM operations, while one other may take some more time scripting but significantly less time with DOM operations. These values can be measured separately, but must be calculated together.

fabiospampinato commented 1 year ago

To give higher cost to inefficient DOM operations one can just add 10 mutation observers on the page or something like that, those things are slow.

Problem is ~everybody uses .appendChild because it's faster for no good reason, when measured in isolation, while .append is faster if there are any mutation observer on the page, potentially faster by a huge amount.

ClassicOldSong commented 1 year ago

That's still not a very good solution to simulate slow repaint/reflows. When moving empty text nodes around, it's costing almost nothing if there's no mutation observers, but with multiple observers added, it's adding unrealistic costs to these originally cheap operations.

leeoniya commented 1 year ago

One opinion: how the script manipulates DOM still counts.

agreed. this whole js/gc-only measurement exercise only makes sense for frameworks that already do near-optimal DOM ops with identical repaint/reflow costs.

however, there could be a different approach here.

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

leeoniya commented 1 year ago

here's another interesting project: https://github.com/yamiteru/isitfast

ClassicOldSong commented 1 year ago

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

That's an interesting idea, but it's hard to determine what is the fastest possible time. We have multiple different implementations with different pros and cons, different approaches may have different limitations from each other. It's very hard to find an even ground for these measures.

leeoniya commented 1 year ago

We have multiple different implementations with different pros and cons

this benchmark does not attempt to pick the fastest framework in all categories. there will likely never be any implementation that has 1.00 across the board. each metric is ranked in isolation, which makes the proposed approach conisistent with how it already works.

krausest commented 1 year ago

I implemented a first version that tries to compute script duration from the trace files, since I couldn't get reasonable values from performance.getMetrics:

It seems to work for some nasty cases like ui5-webcomponents:

Bildschirmfoto 2023-05-08 um 9 47 30 PM

The script duration is the sum of the two yellow top level boxes and yields 13.082 which corresponds to chrome's script duration. In the preview you can choose the duration measurement mode in the dropdown area: total duration which means the time elapsed between the start of the click event and the end of repainting and script duration which should roughly equal the sum of all yellow boxes (except garbage collection) on the highest level: https://krausest.github.io/js-framework-benchmark/current.html (maybe you have to clear the cache to get the latest version)

Looking forward to your feedback. I haven't checked enough values yet to be confident that all values are correct.

leeoniya commented 1 year ago

i'm not sure if script-only is a good metric. i do think that "everything except baseline reflow+restlyle+repaint" is what we need. the difference for swap total vs only-js does not capture the extra ~110ms of DOM/GC inefficiency in the latter:

total:

image

js-only:

image

ClassicOldSong commented 1 year ago

As the author of ef.js, I lost my first place of swap rows in only-js, that's not good 😈

Kidding, but I agree that GC time should be taken account into scripting.

And still, what baseline should we take for reflow+restlyle+repaint?

leeoniya commented 1 year ago

And still, what baseline should we take for reflow+restlyle+repaint?

whatever implementation is fastest in reflow+restlyle+repaint for each metric.

krausest commented 6 months ago

I came back to that issue. We have an established way to compute total duration (end of paint - start of click). I created a way to measure JS duration (sum of the duration of all events that are "EventDispatch", "EvaluateScript", "v8.evaluateModule", "FunctionCall", "TimerFire", "FireIdleCallback", "FireAnimationFrame", "RunMicrotasks", "V8.Execute"). This gives the table above. And it seems to be close to what chrome displays as scripting.

If Browser-ony meant total duration minus js duration results get odd. Miso is fastest for create 1k with 56 msecs total duration, 23 msecs js duration (giving 33 msecs browser only), whilst vanillajs has 39 msecs total duration and 2 msecs js duration which gives 39 msecs browser only. Sorted by create 1 k the table looks like that: Screenshot 2023-11-25 at 10 38 55 AM

It seems to make little sense for create 1,000 rows. Some benchmarks are a little more interesting like remove row: Screenshot 2023-11-25 at 6 47 15 PM

Maybe it makes more sense to compute the lengths of all painting and layout intervals as a third factor?

The current results table lets one play with the three options.

krausest commented 6 months ago

I added a "only render duration" selection. It computes the duration as the sum of all intervals for the "UpdateLayoutTree", "Layout", "Commit", "Paint", "Layerize", "PrePaint" events.

I haven't had the time to do a big quality check yet, so please take the results with a grain of salt.

krausest commented 6 months ago

I ran a check that total time >= script time + paint time. That assertion holds for most runs, except 21 traces (openui is causing 15 of those for replace rows. All other cases are below 1 msec difference, so they don't matter much). The trace looks like that: Screenshot 2023-11-26 at 11 20 45 AM My interval logic computes: total duration=45.898 script duration=23.182 paint duration=32.722 Those numbers look right, but sum of script + paint is misleading though. The issue seems to be that script evaluation and recalculate style happen in parallel (or recalculate style is called from script evaluation).

krausest commented 4 months ago

The result table allows to choose total duration, only JS duration and render duration.