Multithreaded decompression rebase

kevin-vigor commented 3 years ago

Once more, with feeling.

kevin-vigor commented 3 years ago

Well, I suppose it was too much to hope for Microsoft to support C11 in 2021. I'll either disable multithreading on WIndows or figure out how to wield _InterlockedAdd. But tomorrow.

kevin-vigor commented 3 years ago

Windows build succeeds, I have no ability to test it further.

Linux and MacOS builds tested and confirmed working.

I believe this is now reviewable.

vasi commented 3 years ago

Ok! I'll try to get to it this weekend, that soon enough?

kevin-vigor commented 3 years ago

Ok! I'll try to get to it this weekend, that soon enough?

Oh yeah, no rush! Thanks!

haampie commented 3 years ago

This looks nice! I've tried this out on a big (12GB extracted) squashfs rootfs, doing a simple grep benchmark. It's significantly faster than the master branch, almost 40% reduction in runtime. However, the reason for that does not seem to have anything to do with threading, cause when I restrict squashfuse to a single core, it's faster than multiple cores (thread pinning improves caching behavior?):

$ taskset -c 0 ./pe.sh /bin/bash -c "time grep 'lib' -r /opt | wc -l" # using 1 core
66023

real    0m12.485s
user    0m3.144s
sys 0m1.189s

$ ./pe.sh /bin/bash -c "time grep 'lib' -r /opt | wc -l" # using all cores
66023

real    0m14.098s
user    0m4.446s
sys 0m1.650s

The result on master looks like this:

$ taskset -c 0 ./pe.sh /bin/bash -c "time grep 'lib' -r /opt | wc -l"
66023

real    0m19.522s
user    0m3.170s
sys 0m1.198s

$ ./pe.sh /bin/bash -c "time grep 'lib' -r /opt | wc -l"
66023

real    0m21.025s
user    0m4.523s
sys 0m1.733s

The pe.sh script is running unshare, squashfuse without any flags on a squashfs file comressed with zstd, and chroot .. "$@".

Is this expected behavior? Is it maybe that this PR just sets some better defaults for squashfuse?

StealthBadger747 commented 2 years ago

What is the status of this PR? It looks like #59 was merged a day after this was opened and that contained a lot of changes to the structure.

Is there a future for this PR or can someone else pick this up?

kevin-vigor commented 2 years ago

@haampie : with regard to your results above, note that squashfuse does not create parallelism; the only way to take advantage of multithreaded squashfuse is with multiple IO requests in parallel. "grep -r" is singlethreaded and does not benefit from multithreaded squashfuse.

ripgrep, on the other hand, is parallel by default. Using ripgrep as a test case I observe a significant improvement with this changeset in multithreaded mode.

(by "significant" I mean ~40% in some quick and dirty benchmarking, I wish it scaled linearly with CPUs or something magic like that but alas, Amdahl's Law is still a thing).

kevin-vigor commented 2 years ago

Replaced by new pull request #70 .

vasi / squashfuse

Multithreaded decompression rebase #58