systemd / casync

Content-Addressable Data Synchronization Tool
1.5k stars 117 forks source link

A squashfs experiment #46

Open klausenbusk opened 7 years ago

klausenbusk commented 7 years ago

Hello

I just did a little experimenting with casync and squashfs, to check the potential.

So I used the ArchLinux netboot squashfs as the base.

$ unsquashfs airootfs.sfs
Parallel unsquashfs: Using 4 processors
53900 inodes (56523 blocks) to write
[======] 56523/56523 100%

created 45748 files
created 3642 directories
created 5827 symlinks
created 0 devices
$ du -hs squashfs-root/
1,2G    squashfs-root/

Then I created 2 near indenticaly squashfs files from that folder:

$ mksquashfs squashfs-root/ foo.squashfs -comp xz
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on foo.squashfs, block size 131072.
[======] 50696/50696 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072
    compressed data, compressed metadata, compressed fragments, compressed xattrs
    duplicates are removed
Filesystem size 384621.17 Kbytes (375.61 Mbytes)
    35.17% of uncompressed filesystem size (1093687.08 Kbytes)
Inode table size 419016 bytes (409.20 Kbytes)
    21.86% of uncompressed inode table size (1917191 bytes)
Directory table size 498974 bytes (487.28 Kbytes)
    38.32% of uncompressed directory table size (1302032 bytes)
Number of duplicate files found 3835
Number of inodes 55217
Number of files 45748
Number of fragments 3469
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)
$ dd if=/dev/urandom of=squashfs-root/foo bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.121452 s, 86.3 MB/s
$ rm squashfs-root/etc/*.conf
$ mksquashfs squashfs-root/ foo2.squashfs -comp xz
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on foo2.squashfs, block size 131072.
[======] 50741/50741 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072
    compressed data, compressed metadata, compressed fragments, compressed xattrs
    duplicates are removed
Filesystem size 394833.51 Kbytes (385.58 Mbytes)
    35.77% of uncompressed filesystem size (1103835.71 Kbytes)
Inode table size 419028 bytes (409.21 Kbytes)
    21.87% of uncompressed inode table size (1916421 bytes)
Directory table size 498894 bytes (487.20 Kbytes)
    38.33% of uncompressed directory table size (1301745 bytes)
Number of duplicate files found 3830
Number of inodes 55183
Number of files 45714
Number of fragments 3469
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)

So now I have:

$ du -h foo*
386M    foo2.squashfs
376M    foo.squashfs

Now lets create some casync archive (??):

$ casync make --chunk-size=131072 foo.caibx foo.squashfs
9417639da34e16fb4fc11ff332bf7facd080240116129d716463c08d18c99f2a
$ du -hs default.castr/
393M    default.castr/
$ casync make --chunk-size=131072 foo2.caibx foo2.squashfs
f1e3f13669f2c49864c537da04ada35112b15cc2ce1f4ba08b98acb2a1b8483f
$ du -hs default.castr/
625M    default.castr/

So it is reusing: (foo.squashfs+foo2.squashfs-total) = 376+386-625=137 MB worth of chunks or in other words, the client can save 35% traffic (137/386*100).

35% doesen't seems that high, considering the very minimal changes to the filesystem. Do you think this can be improved?

-- Kristian

zonque commented 7 years ago

mksquashfs has a bug which leads to great differences in the resulting file even with identical file systems as input. This is caused by multiple workers on different CPUs racing against each other. You can call the tool with -processors 1 to work around this for now. This is what I'm doing in my deployment scripts.

However, given that the content in the squashfs image is compressed on an inode base, the number of blocks that can be reused will never be ideal. You can experiment with the -noI, -noD, -noF and -noX options. I'd be interested in your findings :)

poettering commented 7 years ago

In addition to what @zonque just said: you need to align the squashfs block size and casync chunk sizes in some way, and I can't really tell you how to do that best, that requires some research. Note that setting the chunk size to the exact same value as the squasfs block size is not the answer: the chunk size you configure in casync is just the average chunk size, meaning that a good part of the chunks will be shorter than the configured value, and another part will be larger. But having smaller casync chunks than the squashfs block size is not useful, as any changed bit in squashfs tends to explode to change the whole block around it, and hence trying to match up parts of it via casync is not going to work.

In the blog story I indicated that this is still left for research. If you are interested in this, I#d very welcome some more comprehensive stats on this. Specifically it might make sense to take some suitable data set (let's say a basic fedora install or so), compress it with various squashfs block sizes, and then run them all through casync also with various average chunk sizes, and draw a graph from that to figure out where the sweet spot lies.

Note that casync's chunking algorithm takes three parameters: the min, the average and the max chunk size. Normally it just expects you to specifiy the average block size, and will then pick the minimum chunk size as 1/4th of it, and the maximum as 4x it. You can alter those values too by using --chunk-size=MIN:AVG:MAX, but do note that the way AVG is currently process means that setting MIN/MAX to anything else than the 0.25x and 4x will skew the chunk histogram in a way that AVG is not actually the everage chunk size anymore, if you follow what I mean. Long story short: unless you know what you do, don't bother with changing MIN/MAX, but do keep in mind that MIN is picked as 1/4th of AVG and that AVG is what you choose.

Also, please keep in mind that large chunk sizes mean that casync is unable to recognize smaller patterns. By picking a small chunk size you hence increase the chance that casync recognizes similar data, but the metadata overhead increases.

poettering commented 7 years ago

or to say all this in different words: I have the suspicion that you get best results if you pick a squashfs block size that is relatively small and that the average chunk size you the configure casync for is at least four times larger.

klausenbusk commented 7 years ago

You can call the tool with -processors 1 to work around this for now.

That made a significant difference:

$ casync make --chunk-size=131072 foo.caibx foo.squashfs
ea4de6574bd73cdd7dd1448324c97e5d4c313301f18e53c638f5ad023231dc93
$ du -hs default.castr/
393M    default.castr/
$ casync make --chunk-size=131072 foo2.caibx foo2.squashfs
1dd2e733ca1683cd8b4399b3e66cef0acc552da0942464c9041451df38d1c113
$ du -hs default.castr/
538M    default.castr/

376+386-538 = 224M / (224/386*100) = 58% reuse.. (Edit: I'm not sure about the math anymore)

You can experiment with the -noI, -noD, -noF and -noX options. I'd be interested in your findings :)

With all the options on:

$ mksquashfs squashfs-root/ foo.squashfs -comp xz -processors 1 -noI -noD -noF -noX
Parallel mksquashfs: Using 1 processor
Creating 4.0 filesystem on foo.squashfs, block size 131072.
[=================================================================================================================================/] 50696/50696 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072
    uncompressed data, uncompressed metadata, uncompressed fragments, uncompressed xattrs
    duplicates are removed
Filesystem size 1062789.11 Kbytes (1037.88 Mbytes)
    97.17% of uncompressed filesystem size (1093687.08 Kbytes)
Inode table size 1917191 bytes (1872.26 Kbytes)
    100.00% of uncompressed inode table size (1917191 bytes)
Directory table size 1302032 bytes (1271.52 Kbytes)
    100.00% of uncompressed directory table size (1302032 bytes)
Number of duplicate files found 3835
Number of inodes 55217
Number of files 45748
Number of fragments 3469
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)
$ dd if=/dev/urandom of=squashfs-root/foo bs=1M count=10
10+0 blokke ind
10+0 blokke ud
10485760 byte (10 MB, 10 MiB) kopieret, 0,121281 s, 86,5 MB/s
$ rm squashfs-root/etc/*.conf
$ mksquashfs squashfs-root/ foo2.squashfs -comp xz -processors 1 -noI -noD -noF -noX
Parallel mksquashfs: Using 1 processor
Creating 4.0 filesystem on foo2.squashfs, block size 131072.
[=================================================================================================================================-] 50741/50741 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072
    uncompressed data, uncompressed metadata, uncompressed fragments, uncompressed xattrs
    duplicates are removed
Filesystem size 1072944.86 Kbytes (1047.80 Mbytes)
    97.20% of uncompressed filesystem size (1103835.71 Kbytes)
Inode table size 1916421 bytes (1871.50 Kbytes)
    100.00% of uncompressed inode table size (1916421 bytes)
Directory table size 1301745 bytes (1271.24 Kbytes)
    100.00% of uncompressed directory table size (1301745 bytes)
Number of duplicate files found 3830
Number of inodes 55183
Number of files 45714
Number of fragments 3469
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)

and now we talking:

$ casync make --chunk-size=131072 foo.caibx foo.squashfs
fa8da5dc23cd3b0ba17f81c5f6a1e0f3cebc9543307392b93d29c7a833932f3e
$ du -hs default.castr/
411M    default.castr/
$ casync make --chunk-size=131072 foo2.caibx foo2.squashfs
f62a294140bbcae6c083e5ffa4096cf921901b07c379c573da456dcbdc02a964
$ du -hs default.castr/
467M    default.castr/

That isn't bad at all, it did reuse 411/467=88% of the old chunks.

or to say all this in different words: I have the suspicion that you get best results if you pick a squashfs block size that is relatively small and that the average chunk size you the configure casync for is at least four times larger.

Hang on, and I will have some data soon.

zonque commented 7 years ago

What's the size of the squashfs image, with and without compression?

klausenbusk commented 7 years ago

What's the size of the squashfs image, with and without the compression turned on?

You can see in in the mksquashfs log, but here you go:

With xz comp:

foo.squashfs Filesystem size 384621.17 Kbytes (375.61 Mbytes)
foo2.squashfs Filesystem size 394833.51 Kbytes (385.58 Mbytes)

Without:

foo.squashfs: Filesystem size 1062789.11 Kbytes (1037.88 Mbytes)
foo2.squashfs: Filesystem size 1072944.86 Kbytes (1047.80 Mbytes)
poettering commented 7 years ago

btw, it'd be excellent if the final findings could be compiled into some document we can add to the package, since I am sure this will pop up again and again

zonque commented 7 years ago

So with compression turned on, 42% of ~380MB (~159MB) and without compression, 12% of ~1040MB (~124MB) are not reused and have to be downloaded when an update is made. So even though the reuse percentage looks better, the actual effect isn't that high.

klausenbusk commented 7 years ago

or to say all this in different words: I have the suspicion that you get best results if you pick a squashfs block size that is relatively small and that the average chunk size you the configure casync for is at least four times larger.

Hang on, and I will have some data soon.

So I created the squashfs files with -b 32K everything else was the same, and it only made it worse.

$ casync make --chunk-size=131072 foo.caibx foo.squashfs
ffa00e9f8fea2d3c007312a2d2465b560d030fd3e19a45aa197beb2223c08379
$ du -hs default.castr/
411M    default.castr/
$ casync make --chunk-size=131072 foo2.caibx foo2.squashfs
56a8e60b88bde1750957c2c74940169c6aa8120d7f11abf6288ad3a9bb5786bb
$ du -hs default.castr/
486M    default.castr/

mksquashfs log:

$ mksquashfs squashfs-root/ foo.squashfs -comp xz -processors 1 -b 32K -noI -noD -noF -noX
Parallel mksquashfs: Using 1 processor
Creating 4.0 filesystem on foo.squashfs, block size 32768.
[=================================================================================================================================\] 71566/71566 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 32768
    uncompressed data, uncompressed metadata, uncompressed fragments, uncompressed xattrs
    duplicates are removed
Filesystem size 1054723.85 Kbytes (1030.00 Mbytes)
    96.42% of uncompressed filesystem size (1093840.92 Kbytes)
Inode table size 2012790 bytes (1965.62 Kbytes)
    100.00% of uncompressed inode table size (2012790 bytes)
Directory table size 1301552 bytes (1271.05 Kbytes)
    100.00% of uncompressed directory table size (1301552 bytes)
Number of duplicate files found 3835
Number of inodes 55217
Number of files 45748
Number of fragments 7366
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)

$ mksquashfs squashfs-root/ foo2.squashfs -comp xz -processors 1 -b 32K -noI -noD -noF -noX
Parallel mksquashfs: Using 1 processor
Creating 4.0 filesystem on foo2.squashfs, block size 32768.
[=================================================================================================================================-] 71851/71851 100%

Exportable Squashfs 4.0 filesystem, xz compressed, data block size 32768
    uncompressed data, uncompressed metadata, uncompressed fragments, uncompressed xattrs
    duplicates are removed
Filesystem size 1064879.91 Kbytes (1039.92 Mbytes)
    96.46% of uncompressed filesystem size (1103989.87 Kbytes)
Inode table size 2012993 bytes (1965.81 Kbytes)
    100.00% of uncompressed inode table size (2012993 bytes)
Directory table size 1300677 bytes (1270.19 Kbytes)
    100.00% of uncompressed directory table size (1300677 bytes)
Number of duplicate files found 3830
Number of inodes 55183
Number of files 45714
Number of fragments 7362
Number of symbolic links  5827
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 3642
Number of ids (unique uids + gids) 1
Number of uids 1
    kristian (1000)
Number of gids 1
    kristian (1000)
zonque commented 7 years ago

Did you play with casync's --chunk-size= parameter as well?

klausenbusk commented 7 years ago

So with compression turned on, 42% of ~380MB (~159MB) and without compression,

Where did you get 42% from?

12% of ~1040MB (~124MB) are not reused and have to be downloaded when an update is made. So even though the reuse percentage looks better, the actual effect isn't that high.

The ~ 1040MB is the squashfs file without compression, casync compress it to 411MB and the secondary squashfs file only add 467-411=56MB more chunks. So the client would need to download 56MB. See https://github.com/systemd/casync/issues/46#issuecomment-311084615 (bottom).

Did you play with casync's --chunk-size= parameter as well?

It was the same as before (131072), which is 4x the squashfs block size.

zonque commented 7 years ago

Where did you get 42% from?

Ah, sorry. My bad. I'll do some tests again soon myself. Last time I did them, casync would still use Adler32 instead of buzhash, but my numbers were similar IIRC.

It was the same as before (131072), which is 4x the squashfs block size.

Yeah, but you could try and alter both block sizes (squashfs and casync).

klausenbusk commented 7 years ago

Yeah, but you could try and alter both block sizes (squashfs and casync).

mksquashfs with -b 64K:

$ casync make --chunk-size=256K foo.caibx foo.squashfs
08bfcbe52ac62383ff3d099ba57e5a4845d38899bf8d4fe7f4567f296ddc944a
$ du -hs default.castr/
377M    default.castr/
$ casync make --chunk-size=256K foo2.caibx foo2.squashfs
ea53a0b771d0945ff019e4978d5d7b8e381368fc31de1d535e8ed95dc674e22a
$ du -hs default.castr/
476M    default.castr/
-------
casync make --chunk-size=64K foo.caibx foo.squashfs
08bfcbe52ac62383ff3d099ba57e5a4845d38899bf8d4fe7f4567f296ddc944a
$ du -hs default.castr/
462M    default.castr/
$ casync make --chunk-size=64K foo2.caibx foo2.squashfs
ea53a0b771d0945ff019e4978d5d7b8e381368fc31de1d535e8ed95dc674e22a
$ du -hs default.castr/
520M    default.castr/

so....

klausenbusk commented 7 years ago

More data:

$ casync make --chunk-size=128K foo.caibx foo.squashfs
08bfcbe52ac62383ff3d099ba57e5a4845d38899bf8d4fe7f4567f296ddc944a
$ du -hs default.castr/
411M    default.castr/
$ casync make --chunk-size=128K foo2.caibx foo2.squashfs
ea53a0b771d0945ff019e4978d5d7b8e381368fc31de1d535e8ed95dc674e22a
$ du -hs default.castr/
486M    default.castr/

$ casync make --chunk-size=192K foo.caibx --store=192 foo.squashfs
08bfcbe52ac62383ff3d099ba57e5a4845d38899bf8d4fe7f4567f296ddc944a
$ du -hs default.castr/
436M    default.castr/
$ casync make --chunk-size=192K foo2.caibx --store=192 foo2.squashfs
ea53a0b771d0945ff019e4978d5d7b8e381368fc31de1d535e8ed95dc674e22a
$ du -hs default.castr/
486M    default.castr/
Spindel commented 7 years ago

I have done some experiments here to see if it's worth replacing our current VCDIFF based updater for squashfs , and the best I end up with is an order of magnitude worse than vcdiff. (56Mib vs 4.9 MiB)

squashfs created with: -comp lzo -processors 1 for all the filesystems, and then varying the block sizes.

Lower block sizes in general seem to give much better delta compreession here.

squashfs block casync block before after delta vcdiff size
128k 128k 109448380 159866408 50418028 5347571
128k 64k 114279552 158378284 44098732 5347571
128k 196k 107544744 166066400 58521656 5347571
64k 196k 108565560 182307500 73741940 4942274
64k 64k 115702772 173936656 58233884 4942274
64k 128k 110695244 176925480 66230236 4942274
64k 4M 101183276 203963016 102779740 4942274
64k 32k 124410768 170925584 46514816 4942274
64k 16k 139519084 176285808 36766724 4942274
256k 16k 136321764 162653496 26331732 5401169
256k 32k 121683700 153911184 32227484 5401169
256k 48k 116118020 154677656 38559636 5401169
256k 64k 113610964 154900344 41289380 5401169
256k 96k 110534840 158229752 47694912 5401169
256k 128k 108844516 158764984 49920468 5401169
256k 160k 107556980 164247960 56690980 5401169
256k 256k 105766892 169314368 63547476 5401169
256k 8K 162973404 187133732 24160328 5401169
256k 10K 152101388 176791976 24690588 5401169
256k 12K 145522592 170632432 25109840 5401169
256k 14K 140354180 165779628 25425448 5401169
Spindel commented 7 years ago

mean uncompressed inside squashfs file is 22k large The filesystems are root filesystems for ARM, designed to fit in a 128MiB partition. I'm continuing some testing to get more numbers here.

Spindel commented 7 years ago

image

Spindel commented 7 years ago

I've done more comparisions on various block sizes of squashfs + casync. Here are all the deltas that are in "acceptable" size ( <28MiB) as selection.

squashfs block casync block before after delta
8K 2k 300193072 326454100 26261028
32K 2K 286269080 313759400 27490320
64k 2K 279804704 303120420 23315716
64k 4K 213521444 239195880 25674436
128k 2K 274465020 299353036 24888016
128k 4K 208666884 235405796 26738912
128k 8K 164161172 192140344 27979172
256k 2K 272197548 296360464 24162916
256k 4K 207236620 231661136 24424516
256k 8K 162952368 187819784 24867416
256k 12k 145523996 171457008 25933012
256k 16K 136290180 163536448 27246268
512K 2K 271066580 294758225 23691644
513K 4K 206668208 230244976 23576768
513K 8K 162075772 185663468 23587696
513K 12k 144542776 168996860 24454084
513K 16K 135811844 161168396 25356552
513K 24K 126159688 153858040 27698352
Spindel commented 7 years ago

Worth noting is that when blocksize is < 8K or similar, the storage area for the blocks on disk is larger than the storage area of the individiual block files themselves.

aep commented 7 years ago

i was wondering if it makes more sense to skip squashfs completely and write the chunkstore directly to flash. Squashfs has never helped us much since it compresses files individually.

Spindel commented 7 years ago

Squashfs doesn't compress files individually. It compresses blocks, a block can contain many files due to tail packing. Side note: That's why you see worse deltas between similar filesystems on squashfs when you use better compression levels. gzip/lzma cause huge deltas.

eMPee584 commented 5 years ago

Btw, mksquashfs just got predictable: https://github.com/plougher/squashfs-tools/commit/e0d74d07bb350e24efd3100d3798f4f6d893a3d9 Maybe an opportunity to reevaluate the situation..