jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.09k stars 181 forks source link

Relative patching with xdelta3 #260

Open superbonaci opened 3 years ago

superbonaci commented 3 years ago

Would like to know if it's possible to patch a destination file without having to write the file entirely, only patch the bytes that differ. For example I have SOURCE and DEST both about 512GB, and a patch about 1MB in size. The problem is, on spinning drives DEST gets zeroed then overwritten, so it takes about 2 hours to complete, since the DEST size is growing larger on filesystem. Would like to know if it's possible to patch only different bytes (=same files size). I think Codefusion Wizard 3.0 allows to do that, or do you know any other utility that can do it?

What xdelta3 currently does is to totally overwrite destination file instead of patching.

abolibibelot1980 commented 3 years ago

If there aren't too many different bytes, one possibility would be sfk setbytes. I've discovered sfk recently, it's a small yet very versatile CLI utility, although it has performance issues when dealing with large tasks. Before I even found out about xdelta (even more recently), I used sfk scripts to do the following : I had an older version and a newer version of re-authored DVDs, I wanted to delete the older versions but preserve the possibility to recreate those files, just in case, for peace of mind's sake. What I did was : compare corresponding files with WinHex, exporting a report indicating all the different bytes (usually a few thousands per 1GB file) ; then edit those reports, with TED Notepad (light yet very powerful text editor), into a list of individual sfk setbytes commands, each meant to overwrite a single byte. For instance, this line in the WinHex report : 2176473159/ 630855: 56 57 was edited into this command : sfk setbytes ".\19900325\VIDEO_TS\VTS_01_3.VOB" 630855 0x57 -yes This command overwrites the byte at offset 630855 of file VTS_01_3.VOB with the hexadecimal value "57". Problem is, with more than a few hundreds of such commands, the process becomes very heavy-duty and takes way longer than it should : I had such scripts with up to ~60000 individual commands, which is only ~60KB of data, but the re-generation test took 20-30 minutes, with high CPU usage (~13% which corresponds to the saturation of a “virtual core” on a 4C/8T CPU), which continued for almost as long after the end of the execution (I found out that it was creating a new temporary 512KB file in the “Temp” directory for every single command, which would explain the poor performance). If the “patch” file is about 1MB in your case it may not be an efficient solution either.

If the different bytes are grouped in specific areas, with large areas of identical data in between, another possibility would be to create sfk partcopy commands instead. HexWorkshop's comparison module displays differences by offset intervals of matched / replaced data (instead of listing every single byte like WinHex), so a comparison report exported from HexWorkshop can be edited into a list of sfk partcopy commands. For instance this command will overwrite 262144 bytes at offset 79691742 of "output" with 262144 bytes copied from offset 79691742 of "input" file : sfk partcopy "input" 79691742 262144 "output" 79691742 -yes I haven't tested this with more than a dozen commands, so I don't know if there are similar performance issues as with “setbytes” with a large number of commands in a script.

Those are not ideal solutions, as they involve quite a bit of manual fiddling, which is always prone to errors, but it may suit your purposes, or give you hints toward finding a better method.

JustMyGithub commented 2 years ago

I wonder about "on spinning drives DEST gets zeroed then overwritten" - that should be possible to fix, if that is true. Could you elaborate on that (e.g. why you think it is written twice)? I think that would be an mistake in xdeltas API usage, if that it really done that way. If a file is fully overwritten anyway it should not be allocated with all-zeroes.