vasi / squashfuse

FUSE filesystem to mount squashfs archives
Other
288 stars 66 forks source link

Multithreaded rebase 2 #70

Closed kevin-vigor closed 1 year ago

kevin-vigor commented 2 years ago

Third time's the charm :)

Rebased the multi-threading patches over the big ll refactor, tested.

Added a bonus changeset to improve SIGTERM handling; previously if a squashfuse process received a SIGTERM the process just exited immediately and filesystem users would hang or crash. Now it does a lazy unmount instead.

ErikParawell-SiFive commented 2 years ago

@vasi thoughts? Also @kevin-vigor it looks like the FreeBSD CI might have been fixed.

Sonicadvance1 commented 2 years ago

Did some benchmarking on this that might be interesting to see. Using FIO which obviously doesn't compress well. But giving it an advantage by using buffered benchmarking instead of direct. Just makes the bench a bit noisier.

The data is a bit interesting here, showing four different ways that I have a squashfuse image mounted, plus erofs comparison point as well. This PR showing pretty much linear scaling with the number of threads hitting the service. The low end of the scale is /quite/ low, not sure if anything can be done to improve that side.

Mount type comparisons (Buffered)

I was doing this benchmarking since I'm using squashfuse and erofs as a filesystem OS storage, where hundreds of processes will be hammering the storage. So multithread support is very important here.

shvchk commented 2 years ago

@Sonicadvance1 nice! DwarFS might be worth trying, too.

kevin-vigor commented 2 years ago

Did some benchmarking on this that might be interesting to see. Using FIO which obviously doesn't compress well. But giving it an advantage by using buffered benchmarking instead of direct. Just makes the bench a bit noisier.

I'm gratified by the nice linear behaviour of the multithreaded squashfs. But I'm less happy about the fact that it seems to behave worse than the singlethreaded version at low numbers of threads.

Can your share your test procedure? I am unable to reproduce this result - squashfuse_ll measures identical bandwidth in single or multithreaded modes with a single IO thread for me (using zstd compressor, FWIW). And also, erofs is equal or worse in all cases that I have measured.

haampie commented 1 year ago

Might also be worth trying this on an AMD EPYC where 64 cores / 128 threads is the norm (and some servers are dual socket AMD EPYC...). If I there's an easy way to setup a benchmark I'd be happy to try.

DrDaveD commented 1 year ago

I'm trying this but so far have not been successful in getting more than one thread activated. I compiled on centos7 with CFLAGS=-std=c99 ./configure --enable-multithreading and see #define SQFS_MULTITHREADED 1 in config.h and the piece in Makefile that's within #if MULTITHREADED in Makefile.am. It passes make check. Yet when I try running squashfuse with an application that uses lots of cores reading files in parallel from the fuse mountpoint I see only one squashfuse thread that's pegged at 100% of one core.

What am I missing?

DrDaveD commented 1 year ago

Ok nevermind I found comments saying it was only in squashfuse_ll. I got it to work after removing the -o uid= and -o gid= my application was setting.

DrDaveD commented 1 year ago

This PR makes a huge impact for container support as detailed in this apptainer issue, so much so that I intend to include a patched version of squashfuse_ll in the next release. Hopefully this PR can be accepted soon and once it is distributed to EPEL/Fedora/Debian/Ubuntu I can remove it from the apptainer distribution.

Even the current squashfuse_ll makes a huge difference, but the multithreaded implementation really makes it comparable to the fastest other ways to access files when there are multiple cores (although I haven't yet compared it to kernel squashfs).

haampie commented 1 year ago

You might want to compare to kernel squashfs, there was quite a gap previously #73, would be curious if it's closed now.

reidpr commented 1 year ago

container support

Charliecloud, another container implementation that I lead, is also interested in multi-threaded SquashFUSE (see issue above). In our case the kernel SquashFS is not an option because we are fully unprivileged.

DrDaveD commented 1 year ago

I do plan to try the kernel squashfs when I get cooperation from a system admin but Apptainer is also moving to non-setuid by default so good performance with unprivileged squashfs is important.

DrDaveD commented 1 year ago

I ran the benchmarks again, slightly more consistently, and this time the kernel squashfs was slightly slower, nearly identical to the multithreaded squashfuse_ll of this PR. The details are in this Apptainer issue comment.

DrDaveD commented 1 year ago

Thanks @kevin-vigor !