project-machine / puzzlefs

Apache License 2.0
378 stars 18 forks source link

Slow decompression speed of the FUSE driver #117

Closed ariel-miculas closed 6 months ago

ariel-miculas commented 9 months ago

I took a root filesystem of 658M and I've built a squashfs image and two puzzlefs images, one compressed and one uncompressed:

puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image barehost
puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image-uncompressed barehost

I then mounted all three images (two puzzlefs images and a squashfs image):

$ puzzlefs mount /tmp/puzzlefs-image-uncompressed barehost /tmp/puzzle-uncompressed
$ puzzlefs mount /tmp/puzzlefs-image barehost /tmp/puzzle-compressed
$ squashfuse_ll ~/work/cisco/test-puzzlefs/barehost.sqhs /tmp/squash

$ mount
...
/dev/fuse on /tmp/puzzle-uncompressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-compressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
squashfuse_ll on /tmp/squash type fuse.squashfuse_ll (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Reading every file with fd:

$ time fd -tf . /tmp/squash -x cat > /dev/null
fd -tf . /tmp/squash -x cat > /dev/null  11.07s user 4.74s system 433% cpu 3.645 total

$ time fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null
fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null  10.77s user 3.52s system 405% cpu 3.525 total

$ time fd -tf . /tmp/puzzle-compressed -x cat > /dev/null
fd -tf . /tmp/puzzle-compressed -x cat > /dev/null  9.50s user 2.70s system 84% cpu 14.419 total

This could be due to decompressing the same blob multiple times instead of caching the decompressed memory (squashfuse does readahead).

ariel-miculas commented 8 months ago

It would be worth implementing zstd seekable compression, that way we wouldn't have to decompress the entire blob to serve one file from it, we could decompress only the blocks needed for that file.

ariel-miculas commented 7 months ago

Results with seekable zstd:

$ time fd -tf . /tmp/squash -x cat > /dev/null
fd -tf . /tmp/squash -x cat > /dev/null  8.04s user 2.72s system 449% cpu 2.393 total

$ time fd -tf . /tmp/erofs -x cat > /dev/null
fd -tf . /tmp/erofs -x cat > /dev/null  8.16s user 2.62s system 465% cpu 2.316 total

$ time fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null
fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null  7.88s user 2.43s system 398% cpu 2.590 total

$ time fd -tf . /tmp/puzzle-compressed -x cat > /dev/null
fd -tf . /tmp/puzzle-compressed -x cat > /dev/null  7.77s user 2.37s system 222% cpu 4.560 total
ariel-miculas commented 6 months ago

Comparison between squashfs, erofs, uncompressed puzzlefs, compressed puzzlefs and compressed puzzlefs with zstd seekable support with different compression frame sizes

Setup

I'm using an image called barehost which is an Ubuntu distribution:

$ du -sh ~/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs
658M    /home/amiculas/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs

Building the images:

# squashfs
mksquashfs real_rootfs/barehost/rootfs barehost.sqhs
# erofs
~/work/erofs-utils/mkfs/mkfs.erofs ~/work/cisco/test-puzzlefs/barehost.erofs ~/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs
# uncompressed puzzlefs
target/release/puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image-uncompressed barehost                                                              │
# unseekable compressed puzzlefs
./master-puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/puzzlefs-unseekable-image barehost
# seekable compressed puzzlefs
target/release/puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/puzzlefs-seekable-image barehost

Mounting the images:

# squashfs
squashfuse_ll ~/work/cisco/test-puzzlefs/barehost.sqhs /tmp/squash
# erofs
~/work/erofs-utils/fuse/erofsfuse ~/work/cisco/test-puzzlefs/barehost.erofs /tmp/erofs
# uncompressed puzzlefs
target/release/puzzlefs mount /tmp/puzzlefs-image-uncompressed barehost /tmp/puzzle-uncompressed
# unseekable compressed puzzlefs
./master-puzzlefs mount /tmp/puzzlefs-unseekable-image barehost /tmp/puzzle-unseekable
# seekable compressed puzzlefs
target/release/puzzlefs mount /tmp/puzzlefs-seekable-image barehost /tmp/puzzle-seekable

Mounts:

erofsfuse on /tmp/erofs type fuse.erofsfuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
squashfuse_ll on /tmp/squash type fuse.squashfuse_ll (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-uncompressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-unseekable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-seekable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Results

Squashfs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/squash -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/squash -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.105 s ±  0.223 s    [User: 6.798 s, System: 1.737 s]
  Range (min … max):   10.607 s … 11.410 s    10 runs

Erofs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/erofs -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/erofs -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     10.133 s ±  0.065 s    [User: 6.612 s, System: 1.572 s]
  Range (min … max):    9.971 s … 10.231 s    10 runs

uncompressed puzzlefs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-uncompressed -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-uncompressed -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):      9.934 s ±  0.071 s    [User: 6.581 s, System: 1.613 s]
  Range (min … max):    9.850 s … 10.038 s    10 runs

unseekable compressed puzzlefs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-unseekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-unseekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     21.396 s ±  0.414 s    [User: 6.771 s, System: 1.715 s]
  Range (min … max):   20.615 s … 21.639 s    10 runs

seekable compressed puzzlefs (1KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     12.475 s ±  0.067 s    [User: 6.733 s, System: 1.700 s]
  Range (min … max):   12.410 s … 12.589 s    10 runs

seekable compressed puzzlefs (2KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     12.056 s ±  0.083 s    [User: 6.700 s, System: 1.671 s]
  Range (min … max):   11.941 s … 12.169 s    10 runs

seekable compressed puzzlefs (4KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.784 s ±  0.046 s    [User: 6.692 s, System: 1.681 s]
  Range (min … max):   11.678 s … 11.825 s    10 runs

seekable compressed puzzlefs (8KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.657 s ±  0.038 s    [User: 6.676 s, System: 1.664 s]
  Range (min … max):   11.616 s … 11.722 s    10 runs

seekable compressed puzzlefs (16KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.662 s ±  0.076 s    [User: 6.691 s, System: 1.668 s]
  Range (min … max):   11.533 s … 11.818 s    10 runs

Conclusion

It seems 4KB is a good choice for the zstd frame size, considering the above results and also keeping in mind that the average chunk size produced by FastCDC with our current parameters is 80KB. Seekable compression reduces the mean reading time of the entire image from ~21.4s to ~11.8s, achieving similar performance to squashfuse (11.1s). This disregards any parallel operations on the filesystem. The image increases from 259MB for compression without seekable support to 289MB for compression with seekable support, for an image of size 658MB.

$ du -sh /tmp/puzzlefs-unseekable-image
259M    /tmp/puzzlefs-unseekable-image
/tmp
$ du -sh /tmp/puzzlefs-seekable-image
289M    /tmp/puzzlefs-seekable-image

Besides the overhead of the seekable frames, because each frame is compressed individually, the compression ratio probably goes down.