tasket / wyng-backup

Fast backups for logical volumes & disk images
GNU General Public License v3.0
251 stars 16 forks source link

Explore metadata shorthand for non-zero bit patterns #222

Open tasket opened 1 week ago

tasket commented 1 week ago

Wyng currently marks all-zero chunks in the manifest without saving a corresponding data chunk file. This might be expanded to other patterns consisting of 8, 16, or 32 bits.

The send-time test for such chunks could be simple: Check for the chunk's first byte(s) being repeated for the remainder of the chunk. This would quickly complete after comparing first few bytes for the vast majority of data chunks.

Before considering implementation, scan some volumes to generate histograms of different patterns and pattern sizes to see if the space savings could be substantial.