openSUSE / mirrorbrain

MirrorBrain
http://mirrorbrain.org/
Other
26 stars 17 forks source link

Parametrize zsync block size for huge files #47

Closed andrii-suse closed 4 years ago

andrii-suse commented 4 years ago

Related to https://github.com/openSUSE/mirrorbrain/issues/22 With fix from #46 with 200G files on local environs tests: BIG_FILE_SIZE=200G REBUILD=1 bash -x mirrorbrain/t/docker/environs/07-zsync.sh (second run is adding following parameters in the test script )

sed -i '/dbname = mirrorbrain/a zsync_hashes = 1\nchunk_size = 33554432\nzsync_block_size_for_1G = 1048576' mb9*/mirrorbrain.conf
chunk_size    |  zsync_block_size_for_1G  |  RAM usage  |  CPU time sec
-----------------------------------------------------------------------------------------------
default (256K)|  default  (4Kb)           |   5.864g    |     35:33
32M           |  1M                       |   100M      |     32:56
andrii-suse commented 4 years ago

I've confirmed the same with zsyncmake as well: with using non-default block size 1M: zsyncsums for 200G file drops from almost ~500M down to ~2M, so it worth to have it configured for huge files. It just needs to alter zsumblocksize column in hash table, because current smallint can hold max 32K

darix commented 4 years ago

Does this have any impact on mod_mirrorbrain or clients actually using zsync files? does that mean the smallest chunks a client then needs download in case of changes is 1MB?

andrii-suse commented 4 years ago

I don't think it has impact on mod_mirrorbrain. Below if my investigation based on few hours research, so may be wrong. Zsync algorithm pre-calculates checksums for each block, then client program needs to sync only those blocks which have different checksum. Small (default) block size is only relevant when some fragments of the file differ, but it is probably not the case for .iso

Moreover, if my calculations are correct - 200G file with default 4K block - needs to store and send 500M checksums, which makes little sense. (And the crash in #22 happens when it tries to store those checksums into DB). Correct solution here is to use much bigger block.