Closed tych0 closed 1 year ago
We have some basic support for this now with d52efd2b805885e35e089ae4b33177dff13b0b93; let's see how it goes.
For fast random access, compressed files are split up in fixed size blocks that are compressed separately. The block size can be set between 4k and 1M (default for squashfs-tools and squashfs-tools-ng is 128K).
This is done by splitting up the input data into frames, each of which are compressed independently, and so can be decompressed independently. Decompression then takes advantage of a provided 'seek table', which allows the decompressor to immediately jump to the desired data.
Depending on the chunking parameters we choose, the seekable compression may not be worth it. For example, let's say we pick 16KB for the average chunk and 64KB for the maximum chunk. Then the maximum chunk size is less than the squashfs default block size of 128KB. Also, zstd seekable compression includes extra metadata for storing information about the offsets, resulting in a size increase for the compressed chunks. Plus we'll be stuck with zstd.
Could you post your deduplication performance results, or maybe just a concise summary?
IIUC, tuning the parameters made it so that indeed compression may not be needed.
I still think we need compression, I'm only arguing against seekable compression.
Sounds good.
An interesting point from squashfs is that they're not compressing blocks if this would result in a larger size, in this case they're storing the block uncompressed, perhaps we should also implement this. Or even this:
By checking compression ratios and storing compressed only those chunks whose compression ratios meet a certain threshold, the amount of decompression involved during data access can be reduced ... This allows for capturing most of the savings of chunk compression while eliminating the cost of decompression on the majority of chunks.
from the microsoft paper
We can have optional support for compression and we need a new field in BlobRef to indicate whether the blob is compressed or not. The blob still needs to be content-addressed, so the sha256sum of the compressed blob needs to be computed.
We should also process the files in parallel during puzzlefs build in order to speed up the compression.
next steps:
Build and mount the filesystem:
target/release/puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/oci-barehost test
target/release/puzzlefs mount -f /tmp/oci-barehost test /tmp/puzzle
Reading an entire filesystem:
$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null 9.64s user 2.54s system 280% cpu 4.338 total
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'
Benchmark 1: fd -tf -x cat > /dev/null
Time (mean ± σ): 6.233 s ± 0.455 s [User: 12.785 s, System: 4.290 s]
Range (min … max): 5.503 s … 7.020 s 10 runs
$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null 8.59s user 2.13s system 57% cpu 18.549 total
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'
Benchmark 1: fd -tf -x cat > /dev/null
Time (mean ± σ): 20.681 s ± 1.989 s [User: 10.174 s, System: 2.866 s]
Range (min … max): 18.057 s … 24.472 s 10 runs
squashfuse -f barehost.sqhs /tmp/squash
$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null 11.50s user 3.83s system 96% cpu 15.816 total
$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'
Benchmark 1: fd -tf -x cat > /dev/null
Time (mean ± σ): 13.134 s ± 1.194 s [User: 10.361 s, System: 2.903 s]
Range (min … max): 11.355 s … 16.096 s 10 runs
Right now all the metadata and file contents for puzzlefs are uncompressed. We should compress them.
The design format of puzzlefs constrains the use of compression a bit though: we need to be able to read data from an arbitrary offset (the compression community calls this a "seekable compression format").
People like zstd as a compression format, and it has out-of-tree implementations of this with rust bindings:
https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md https://crates.io/crates/zstd-seekable
which may be interesting.