qddjs / qdd

Download JavaScript Dependencies, really fast
MIT License
52 stars 3 forks source link

Not as fast on Macs #2

Open bengl opened 6 years ago

bengl commented 6 years ago

As first reported by @zkat, and verified by others, it looks like the final number on the benchmark, referring to the primed-cache speed of qdd, seems to be not significantly better on Macs. While I'm getting times of 0.5s on my machine (Linux), Mac users seem to be getting closer to 4s or 5s.

I collected .cpuprofile data from my machine and a Mac, and found that around 80% of the time on Macs is being spend in (idle), leading me to believe it's simply waiting on filesystem operations the whole time. On Linux the (idle) time is closer to 20%-25%, so while this might not account for all of the overhead, it at least accounts for a really huge chunk of it.

Since almost all of that operation consists of a recursive copy, I modified that file to use standard fs module operations (the qdd version calls the binding directly), and also added perf_hooks marks to see which of the filesystem calls were taking up the most time. The resulting script is in this gist, which can be run from any arbitrary empty directory. The test downloads a tarball, unpacks it, and measures the time to copy recursively from one directory to another. In qdd these operations happen many, many times in parallel.

Here are the results (time in ms):

My Arch Linux Lenovo X1 Carbon (from 2016):

$ node copytest.js 
readdir 1.995347
stat 16.328005827783088
mkdir 7.106336499999999
copyfile 15.482173730219255
---
77.179121

A Google Compute Cloud instance (TODO put specs in here):

$ node copytest.js 
readdir 1.7026445
stat 14.900154024738319
mkdir 9.860568500000001
copyfile 15.350753629170656
---
71.24901

A macincloud.com Pay-as-you-Go instance (OS X High Sierra):

$ node copytest.js 
readdir 4.9309635
stat 83.9008870846811
mkdir 38.07782
copyfile 78.84369494852221
---
353.116088

While this is still pretty inconclusive, fs.stat and fs.copyFile seem to be taking considerably longer on a Mac than on Linux. In all tests, node@8.9.4 is used. For both my machine and the Google instance, the filesystem is ext4 and for the Mac it's HFS+.

evanlucas commented 6 years ago

Maybe compare ulimit -n? I think it is 256 on macOS by default... not sure what benchmark you are using, but that is pretty low if you are opening a bunch of files.

addaleax commented 6 years ago

Does the perf difference carry over to sync calls as well?

bengl commented 6 years ago

@evanlucas For ulimit -n I'm seeing 4096 on my Linux machine, 32768 on macOS.

@addaleax It looks like it does. I added a sync version of the test to the gist. Here are the results:

Linux:

$ node copytestsync.js 
readdir 0.44465699999999997
stat 0.01158147002854424
mkdir 0.056488000000000003
copyfile 0.02846236224976165
---
75.75296

Mac:

$ node copytestsync.js
readdir 0.9944815
stat 0.050528007611798355
mkdir 0.222148
copyfile 0.48516409151572953
---
673.984851
bengl commented 6 years ago

Since the test code here is effectively doing the same thing as a cp -r, I thought I'd try timing that in both environments. The script used is in the gist as copytest.sh.

Linux:

real    0m0.022s
user    0m0.003s
sys     0m0.019s

Mac:

real    0m0.318s
user    0m0.026s
sys     0m0.281s
addaleax commented 6 years ago

I think that rules out overhead from the threadpool mechanisms (which is ultimately implemented in a very platform-dependent way). (Btw, file system writes are protected by a global lock on OS X, so they can’t use the threadpool effectively there – but that seems unlikely, too, if it also affects sync code and other functions.)

If you want to hear my best guess, it’s probably an actual perf difference in the OS or the file system. I guess trying to reproduce this with C code using the raw syscalls could prove or disprove that?

tlhunter commented 6 years ago

@bengl, if you spend the entire night debugging syscalls, which I suspect you will, you should take notes and turn it into a talk ;)

LarsJK commented 6 years ago

Is filevault enabled?

bengl commented 6 years ago

@LarsJK AFAIK yes, but note also that my Linux system is using LUKS, which I'd imagine is pretty similar in terms of overhead.