jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.1k stars 184 forks source link

Stream alignment in the presence of deletions #212

Open pekon opened 8 years ago

pekon commented 8 years ago

I am getting unexpected size of delta, when there are are a lot of deletions between source and target. Here is a script to produce test files (random source, target deletes every other 64 KiB block):

b=65536 n=32 dd if=/dev/urandom bs=$b count=$n of=source rm target for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 do ofs=$((i * 2)) dd if=source bs=$b skip=$ofs seek=$i of=target count=1 done xdelta3 -f -B 524288 -W $b -e -s source target delta

The comments in the code indicate that xdelta should be able to align streams in constant space, but in this case the streams get misaligned after processing one source window size of source data.

jmacd commented 8 years ago

Yes. This issue has been on my short list. I thought about it and troubled over how to improve the stream alignment without giving up certain properties of the implementation (i.e., fixed memory & streaming).

There is already an optional mode in the encoder that is aware of the end-of-file position, when valid (not valid in streaming applications). It could be taught to increase or decrease the stream alignment (vs 1:1) when both inputs are known size. Would you be interested in a command-line flag --speed telling the encoder to advance in the source at a different rate, relative to the target input?