krausest / js-framework-benchmark

A comparison of the performance of a few popular javascript frameworks
https://krausest.github.io/js-framework-benchmark/
Apache License 2.0
6.56k stars 811 forks source link

Running tests on an M1 Mac/ARM builds of Chrome #885

Closed brainkim closed 2 years ago

brainkim commented 3 years ago

Has anyone tried running the benchmarks on an M1 Mac yet? What Iā€™m seeing is that 16x slowdowns are disproportionately slow compared to intel, such that it almost seems like the tests are just hanging, even on VanillaJS the first run of benchmarks 03 and 04 seem to take forever.

MacBook Pro (13-inch, M1, 2020) macOS Big Sur Version 11.2.2 Google Chrome Version 90.0.4430.93 (Official Build) (arm64)

ryansolid commented 3 years ago

Yeah I have one as well. MacBook Air M1. Exact same problem. All the slowdowns are bad. They lead to a performance drop off progression. You can see even with 2x or 4x slowdown that each subsequent run gets slower. Just at 16x it is insane, like 10seconds + per test run. I was hoping this was just a chrome driver thing that would go away, but had this problem in 89 and in 90. I have just been turning off slowdowns completely locally so I can get consistent-ish results even if technical subframe results are a bit variable.

So yes very much real.

krausest commented 3 years ago

I also couldn't resist and bought a MacBook Air M1. Just as a rant: It's really a lovely machine, though I started to realize how much I should have loved my linux based Razer Blade 15 after I found that the MacBook isn't perfect neither (even its great touchpad). All the possibilities in Linux make me aim for perfection but only leave me disappointed. But I really enjoy working with the Razer much more since I started to use both šŸ˜„

I saw the same effect. The CPU slow down isn't comparable and much slower than on linux. But I even saw that when switching between linux and windows on the Razer Blade 15. I'm a bit afraid of CPU throttling due to the fact that the MacBook has no fan (for anything but benchmarks this is of course a great feature). @ryansolid Did you see CPU throttling effects for the benchmark?

I'm a bit hesitating to publish results for the MacBook since results won't be comparable. A possibility would be to temporarily remove all CPU slowdowns to allow a better comparison and add other slowdowns when we decide that we migrate back to a Mac.

krausest commented 3 years ago

If anyone wants to try: The build currently doesn't work with arm64 node builds (e.g. dojo fails when it tries to install electron-v7.1.14-darwin-arm64.zip). It worked when I used an intel node version (I used v14.15.5). I can't remember exactly how I installed the Intel version of node, maybe I used the "rosetta terminal" trick from https://dev.to/courier/tips-and-tricks-to-setup-your-apple-m1-for-development-547g.

krausest commented 3 years ago

I put a preview on https://krausest.github.io/js-framework-benchmark/2021/table_chrome_90_osx.html and https://krausest.github.io/js-framework-benchmark/2021/table_chrome_90_linux.html (incr_dom is missing on OSX since it didn't build)

ryansolid commented 3 years ago

I tend to not run the whole test suite as Im always focused on specific implementations. I have not seen throttling but I only run maybe 3-4 frameworks at a time.

No slowdown feels too variable for the results. Having Vanilla out front but with a 1.06 score is strange compared to what we've been looking at tge last 2 years. While maybe our current scores are a bit artificial due to slowdown there is a clear order of things. Selection especially.

krausest commented 3 years ago

CPU throttling must be taken care of. My current hypothesis is that I can check if some throttling happens by watching: sudo powermetrics --samplers cpu_power,thermal --show-plimits -i 1000 | grep "P-Cluster frequency limited by FASTTEMP" Can anyone confirm this?

krausest commented 3 years ago

And it looks like the battery and ac powered performance is the same (take that AMD).

krausest commented 3 years ago

I think I made a mistake. Chromedriver ist by default still x64 which also opens Chrome in x64 mode. There are already apple silicon builds for it: https://chromedriver.storage.googleapis.com/index.html?path=90.0.4430.24/

So far I only got it work when I:

  1. Download the chromedriver_mac64_m1.zip
  2. Extract and right click with option key pressed down to allow the executable to be run
  3. Copy the executable to webdriver-ts/node_modules/chromedriver/lib/chromedriver/chromedriver

I'll report back with with apple silicon chrome numbers.

krausest commented 3 years ago

The benchmark is currently not very stable. I see often cases with a "soundness check failed". It turns out that for 12 runs of a benchmark the performance log contains only 11 initBenchmark or finishedBenchmark logs. No idea why that happens. (If I add an additional log entry when starting and ending a run, I see both of them, such that the log isn't cut off at the beginning or ending, but sometimes only 11 initBenchmark, runBenchmark, finishedBenchmark, afterBenchmark timeline entries. On the console I can clearly see that the benchmark was run 12 times, as expected).

krausest commented 3 years ago

This might be one of the oddest things I've seen with webdriver yet (and I've seen some...) I've taken a deeper look and added as the first statement in the for loop in runCPUbenchmark:

    await driver.executeScript("console.timeStamp('initDriver')");
    for (let i = 0; i < benchmarkOptions.batchSize; i++) {
      await driver.executeScript("console.timeStamp('driverLoop')");
      await driver.executeScript(`console.timeStamp('loop${i}')`);
      console.log("runbenchmark ", i);

Of course in the terminal I can see that runbenchmark is logged for all numbers from 0 to 11. But the perf log is missing loop1! When I count all events I see for this benchmark 11 loops and also 88 click events. Since it was benchmark 05_ that failed this time uses one click on run, 6 clicks on swap for warmup and 1 for the benchmark. So it really looks like for index 1 all perf log entries are missing.

krausest commented 3 years ago

Looks like I can avoid this issue when I disable site-isolation with the following flag for chrome "--disable-features=IsolateOrigins,site-per-process". I have no more soundness check failed errors any more.

krausest commented 3 years ago

There are two other errors that sometimes occur:

  1. Missing click event Happens e.g. for hyperapp benchmark 09 pretty often. I expect one click event (click on #clear) between runBenchmark and finishedBenchmark, but there's just a single (expected) click event after initBenchmark (click on #run)
    {"type":"driverLoop","ts":747590152833,"dur":0,"end":747590152833}
    {"type":"loop5","ts":747590154791,"dur":0,"end":747590154791}
    {"type":"navigationStart","ts":747590179666,"dur":0,"end":747590179666}
    {"type":"paint","ts":747590193250,"dur":41,"end":747590193291,"evt":"{\"method\":\"Tracing.dataCollected\",\"params\":{\"args\":{\"data\":{\"clip\":[0,0,1200,0,1200,721,0,721],\"frame\":\"919D5AAE77A7808C8CBC32B6987EDDBF\",\"layerId\":28,\"nodeId\":79}},\"cat\":\"devtools.timeline,rail\",\"dur\":41,\"name\":\"Paint\",\"ph\":\"X\",\"pid\":51111,\"tdur\":68,\"tid\":259,\"ts\":747590193250,\"tts\":589192}}"}
    {"type":"initBenchmark","ts":747590196291,"dur":0,"end":747590196291}
    {"type":"paint","ts":747590207250,"dur":41,"end":747590207291,"evt":"{\"method\":\"Tracing.dataCollected\",\"params\":{\"args\":{\"data\":{\"clip\":[0,0,1200,0,1200,721,0,721],\"frame\":\"919D5AAE77A7808C8CBC32B6987EDDBF\",\"layerId\":28,\"nodeId\":79}},\"cat\":\"devtools.timeline,rail\",\"dur\":41,\"name\":\"Paint\",\"ph\":\"X\",\"pid\":51111,\"tdur\":37,\"tid\":259,\"ts\":747590207250,\"tts\":593980}}"}
    {"type":"click","ts":747590219625,"dur":458,"end":747590220083}
    {"type":"paint","ts":747590273458,"dur":42,"end":747590273500,"evt":"{\"method\":\"Tracing.dataCollected\",\"params\":{\"args\":{\"data\":{\"clip\":[0,0,1200,0,1200,721,0,721],\"frame\":\"919D5AAE77A7808C8CBC32B6987EDDBF\",\"layerId\":27,\"nodeId\":79}},\"cat\":\"devtools.timeline,rail\",\"dur\":42,\"name\":\"Paint\",\"ph\":\"X\",\"pid\":51111,\"tdur\":9,\"tid\":259,\"ts\":747590273458,\"tts\":651675}}"}
    {"type":"paint","ts":747590273500,"dur":1000,"end":747590274500,"evt":"{\"method\":\"Tracing.dataCollected\",\"params\":{\"args\":{\"data\":{\"clip\":[0,0,1200,0,1200,4721,0,4721],\"frame\":\"919D5AAE77A7808C8CBC32B6987EDDBF\",\"layerId\":28,\"nodeId\":79}},\"cat\":\"devtools.timeline,rail\",\"dur\":1000,\"name\":\"Paint\",\"ph\":\"X\",\"pid\":51111,\"tdur\":1006,\"tid\":259,\"ts\":747590273500,\"tts\":651693}}"}
    {"type":"runBenchmark","ts":747590292125,"dur":0,"end":747590292125}
    {"type":"paint","ts":747590319625,"dur":83,"end":747590319708,"evt":"{\"method\":\"Tracing.dataCollected\",\"params\":{\"args\":{\"data\":{\"clip\":[0,0,1200,0,1200,721,0,721],\"frame\":\"919D5AAE77A7808C8CBC32B6987EDDBF\",\"layerId\":28,\"nodeId\":79}},\"cat\":\"devtools.timeline,rail\",\"dur\":83,\"name\":\"Paint\",\"ph\":\"X\",\"pid\":51111,\"tdur\":65,\"tid\":259,\"ts\":747590319625,\"tts\":677614}}"}
    {"type":"finishedBenchmark","ts":747590331041,"dur":0,"end":747590331041}
    {"type":"afterBenchmark","ts":747590332625,"dur":0,"end":747590332625}
  2. imba can't be run. It works if I add a driver.sleep. (But sprinkling driver.sleep feels so wrong...)
krausest commented 3 years ago

With a lot of fiddling I managed to get results for m1 chromedriver and chrome. The reported values are amazing (but I have to admit that they are faster than what I see in the timeline when manually testing). Here's an example (y axis are milliseconds):

Bildschirmfoto 2021-05-22 um 15 02 17

The odd thing is that in the timeline create 1,000 rows for vanillajs looks pretty consistently like that: imageThat's almost twice the duration reported by the benchmark driver.

Please let me investigate before drawing the conclusion that M1 is so much faster.

krausest commented 3 years ago

Cross check linux: Reported median value 78.6, one observed value in chrome. Looks ok. Screenshot from 2021-05-22 19-37-53

Just as a side note - it's really important to disable all extension (pretty sure I didn't for OSX). Here's an example for uBlock Origin and 1Password turned on: Screenshot from 2021-05-22 19-43-42

krausest commented 3 years ago

Chrome on OSX/M1 is really driving me nuts. I have now all extensions disabled. There are two way to start recording a timeline:

  1. Reload the page and press the "record" button. I have observed durations in the range 80 ms - 100ms, no faster runs:

    Bildschirmfoto 2021-05-22 um 20 13 56

    (This picture is crazy - it takes about 15 msecs before the event handler is executed! Take a look at how late the blue add function starts within the click event)

  2. "Start profiling and reload page" - and there we have it. Pretty often I see a duration of about 45-48 msecs:

    Bildschirmfoto 2021-05-22 um 20 24 24

    (This looks much more logical since the add method starts with the click event)

Once again I've no idea why there's a difference between both, but I tend to believe # 2

krausest commented 3 years ago

Cross check linux: It's similar! When reloading the page and clicking record, sometime the duration is much higher than expected (like 110 msecs) Screenshot from 2021-05-22 20-32-13 but most of the time it's much closer e.g. 88 msecs Screenshot from 2021-05-22 20-31-41 In both cases the execution of the event handler appears delayed!

With "Start profiling and reload page" I get durations that match the reported values from the benchmark: Screenshot from 2021-05-22 20-41-29

krausest commented 3 years ago

Conclusion for today:

  1. It looks like Chrome on arm64 is really much, much faster than my linux notebook (and of course faster than rosetta emulated Chrome x64)
  2. To compare the results from the benchmark one should use "Start profiling and reload page" in the chrome performance tab
  3. There are still a few oddities with chromedriver on M1: A few click events don't get reported, imba runs only with some sleep statements and occasional missing log entries for a whole loop run unless site-isolation is disabled.

Any opinions on that?

krausest commented 3 years ago

Building react-focal fails for node v16.2.0 (M1), but works with v14.15.5 (x64).

Stefan@MacBook-Air-von-Stefan ~ % cd Source/Javascript/js-framework-benchmark/frameworks/keyed/react-focal
stefan@MacBook-Air-von-Stefan react-focal % npm ci
npm ERR! code ERR_SOCKET_TIMEOUT
npm ERR! errno ERR_SOCKET_TIMEOUT
npm ERR! request to https://artifactory.grammarly.io/artifactory/api/npm/common-npm/yocto-queue/-/yocto-queue-0.1.0.tgz failed, reason: Socket timeout

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/stefan/.npm/_logs/2021-05-28T06_24_03_675Z-debug.log
krausest commented 3 years ago

Here are the results for chrome 91 on M1: https://krausest.github.io/js-framework-benchmark/2021//table_chrome_91.0.4472.77_osx_m1.html

Good news: No more obscure bugs (no soundness errors - i.e. number of events in performance log inconsistent, no missing click events) Bad news: Seems to be significantly slower than chrome 90.

krausest commented 2 years ago

Sadly node-chromedriver still doesn't download the M1 build automatically. The following in webdriver-ts works for me such that a chrome arm64 process is used for the benchmark:

curl https://chromedriver.storage.googleapis.com/92.0.4515.107/chromedriver_mac64_m1.zip --output chromedriver_m1.zip 
unzip chromedriver_m1.zip 
npm install
npm run compile
npm run bench

(It's a bit strange - I played with CHROMEDRIVER_FILEPATH and CHROMEDRIVER_SKIP_DOWNLOAD, but they didn't seem to change the behaviour in comparison to the above). Site isolation is now disabled on a M1 Mac.

krausest commented 2 years ago

As you've probably noticed I currently really consider switching to puppeteer and I created a puppeteer branch. At first glance puppeteer runs much smoother on OSX (no issues with missing clicks with site-isolation, no memory leaks, less workarounds, ...). One of the best features it that it even write the performance trace in a way that can be loaded in chrome afterwards.

That's nice, but there's still the issue mentioned above. Here's the trace from puppeteer: image create 1,000 rows scores about 45 msecs on my MacBook Air. (With the old hack from https://github.com/krausest/js-framework-benchmark/blob/860ea18093715bfea69ece8556085fcecb06e0c1/frameworks/keyed/vanillajs/src/Main.js it reports something like "run took 44.80000001192093" in the console) Results from puppeteer are rather consistent like: [45.625,46.125,47.25,46.25,45.625,46.625,46.25,45.625,45.875,46.375]

Currently it seems that I can reproduce results in the 45 msecs range when I open the profiler and click on the "start profiling and reload page" button.

When I just load the page, open the dev tools and start tracing with the record button I often get a result that's almost twice as long:

Bildschirmfoto 2021-09-26 um 20 37 18

The hack prints then something like "run took 83.39999997615814".

When I just reload the page and keep dev tools closed and just click on the append button the hack prints numbers in the 80 msecs range only. I have never seen a 45 msec result on the console unless I'm running in the "start profiling and reload page" button or running with the puppeteer driver.

OTOH even when I run chrome with my default profile and let puppeteer connect with the running chrome instance await puppeteer.connect({browserWSEndpoint: ... })and start chrome with --remote-debugging-port=9222 I only see those 45 msecs results, never something in the 80 msecs range.

I'm not sure whether we can consider the 45 msecs range as correct. What's your opinion on that? Maybe someone with a M1 Mac can take a look what chrome dev tools report on your machine? ( vanillajs-keyed is available on https://stefankrause.net/chrome-perf/frameworks/keyed/vanillajs/index.html )

krausest commented 2 years ago

1020 made me implement four different benchmark drivers.

I currently favour playwright. Puppeteer sometimes reports no paint event and with full tracing variance of the results is rather low. And it avoids all that chromedriver M1 quirks. You can choose the runner by using puppeteer|playwright|webdrivercdp|webdriver as the runner argument like: npm run bench -- --framework keyed/vanillajs --runner playwright

I've run the full benchmark for OSX: https://krausest.github.io/js-framework-benchmark/2022/table_chrome_101_osx.html Overall they are pretty close to the linux results (if you ignore that the fanless M1 is almost twice as fast as that i7 8750H šŸ˜„ ) Left OSX, right Linux:

Bildschirmfoto 2022-05-05 um 09 24 27

I'm closing this issue. OSX results look good enough such that I could make the switch, but I'm not sure if I want to block my MacBook regularly. The long runtime hurts less on the less loved linux laptop.