jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.1k stars 184 forks source link

xdelta3 produces huge delta ouput sometime #175

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
xdelta3 produces huge delta ouput sometime

i am using omni's openDelta delta update solution[1]. its script use xdelta3 as 
diff tool, but i found some version got very big delta output.

the full list was pasted here: http://paste.omnirom.org/view/9d1604e0

What steps will reproduce the problem?
======================================
the steps are a bit trivial.

1. download 206 and 207 ota zip
https://www.dropbox.com/s/npclp9auw2rkwj0/X-S01A_HIKe_L3SM_206_131008165047.zip
https://www.dropbox.com/s/efufznh9b2lpbv3/X-S01A_HIKe_L3SM_207_131013211249.zip

2. zipajust (compiled from attachment) expands zip to compress level0, 
but not unzip them.
gcc -o zipadjust zipadjust.c zipadjust_run.c -lz
zipajust --decompress X-S01A_HIKe_L3SM_206_131008165047.zip from.zip
zipajust --decompress X-S01A_HIKe_L3SM_207_131013211249.zip to.zip

3. xdelta3 them
xdelta3 -9evfS none -s from.zip to.zip 206_to_207.update

What is the expected output? What do you see instead?
=====================================================
206 to 207 delta size is near the size of 205_to_206.

What version of the product are you using? On what operating system?
3.0.7 included in the openDelta.
i have tried 3.0.8 achieve in the downloads, the same result.

Please provide any additional information below.
================================================
full list of input full zip size and produced delta(.update) size 
can be found here: http://paste.omnirom.org/view/9d1604e0
name after the from version, eg, XXX_206_XXX is delta from 206 to 207

Notes
=====
[1]https://github.com/omnirom/android_packages_apps_OpenDelta

Original issue reported on code.google.com by qingxianhao on 27 Feb 2014 at 12:03

Attachments:

GoogleCodeExporter commented 9 years ago
I will take a look at this.

The usual reason for this type of problem is that the files have been reordered 
and Xdelta is not given enough source-window memory (the -B flag) to recognize 
the reordering.

Original comment by josh.mac...@gmail.com on 3 Mar 2014 at 4:43

GoogleCodeExporter commented 9 years ago
wow, you pull me out.

i have tested xdleta with different buffer size(.update files' suffixes implied 
the source-window size), 
it seems 59MB is stable diff size(that's significant reducing).
du -m out/*
59      out/X-S01A_HIKe_L3SM_206_131008165047.update1024
59      out/X-S01A_HIKe_L3SM_206_131008165047.update128
59      out/X-S01A_HIKe_L3SM_206_131008165047.update2048
59      out/X-S01A_HIKe_L3SM_206_131008165047.update256
59      out/X-S01A_HIKe_L3SM_206_131008165047.update512

i don't know so much about the Xdelta internal. that is, i almost have no idea 
about reordering or the source-window.

then, what is size proper source-window size should i use?
should i pass in the buffer size at least of double the size of max source file?

Original comment by qingxianhao on 4 Mar 2014 at 8:05

GoogleCodeExporter commented 9 years ago
For the record, I'm looking into this but the files are downloading slowly and 
I won't actually see them until tomorrow at the earliest.

Original comment by josh.mac...@gmail.com on 11 Mar 2014 at 6:00

GoogleCodeExporter commented 9 years ago
By the way, your "zipadjust" tool sounds very useful -- but I have trouble 
building it (there is no "main" function). How do I use it?

Xdelta has support on some platforms for automatically uncompressing and 
recompressing compressed data of known formats. This would be perfect for 
handling zip files if it were commonly installed and widely available.

Original comment by josh.mac...@gmail.com on 11 Mar 2014 at 6:34

GoogleCodeExporter commented 9 years ago
sorry for missed out the zipadjust_run.c file, which contains the main.
  gcc -o zipadjust zipadjust.c zipadjust_run.c -lz
attached updated one.

and, the zipadjust is implemented by awesome guy Chainfire(and so do you), the 
author of OpenDelta, from the Omni android ROM team. 

Original comment by qingxianhao on 11 Mar 2014 at 9:56

Attachments:

GoogleCodeExporter commented 9 years ago
Several things:

"boot.img" is relatively uncompressible. Even if I extract the two copies of 
boot.img and run xdelta3 on just those files, it doesn't compress very much. So 
that's 5MB.

Then, there are 4 new large files included in version 207 that are not part of 
206, in the data/app directory, sized 7, 6, 10, and 24MB.

Between boot.img and four new files, that's over 50MB. The rest of the data is 
fairly compressible.

To answer your question above, for best performance you want to set -B<size> so 
that <size> is a power of two larger than the source data. In this case, 512MB. 
But as you noticed, adding memory won't help with uncompressible data and/or 
new data.

Thanks for reporting this, I'm glad to have learned about OpenDelta and may 
find a way to incorporate the zipadjust code, it's very handy.

Original comment by josh.mac...@gmail.com on 12 Mar 2014 at 5:58

GoogleCodeExporter commented 9 years ago
"<size> is a power of two larger than the source data."
got that.
thank you sooooo much for patient answering my question and awesome job on 
xdelta.

for boot.img android officially use a tool named imgdiff to do the diff between 
.img files.
if you'd like to take a look at it. 
http://androidxref.com/4.2.2_r1/search?q=imgdiff

Original comment by qingxianhao on 12 Mar 2014 at 9:35

GoogleCodeExporter commented 9 years ago
Thanks for the work on xdelta3 and finding out the problem for us, Josh.

For your reference, OpenDelta's home is here: 
https://github.com/omnirom/android_packages_apps_OpenDelta , including the 
zipadjust sources.

A word of warning: zipadjust was written specifically for these OTA ZIPs the 
Android build system produces. It strips out whole-file signatures and other 
blocks it deems irrelevant, and doesn't support nearly the entire ZIP format as 
it is in use today (which has in fact grown quite complex and convoluted). As 
such, you could use it as a starting point, but I would not apply to random ZIP 
files without extensive testing and pooring over the ZIP format docs again and 
again.

Original comment by jor...@jongma.org on 21 Mar 2014 at 10:35

Prachi2812 commented 9 years ago

Hi Josh, You said in one of the comments that 'for best performance you want to set -B so that is a power of two larger than the source data. But as you noticed, adding memory won't help with uncompressible data and/or new data.' I was using xdelta3 on two files where the original file is 109mb and target file is 99.9mb. But when I tried -B 134217728, then it degrades the performance as compared to the default -B 67108864. I checked the delta produced, it doesn't have much new data. Here are the results:

  1. Using default -B 67108864: delta file size - 5131 bytes encoding time - 396 ms applying the delta - 14 s
  2. Using -B 134217728: delta file size - 5194 bytes encoding time - 769 ms applying the delta - 16 s
mgrinzPlayer commented 9 years ago
Adding memory won't help with uncompressible data and/or new data

It's not that.

Do not use xdelta (and xdelta3) directly on compressed files such as zip, gzip, rar, 7z, or installers such as NSIS, INNO, because those programs tend to increase the difference between files.

Source and target have to be not compressed, nor encrypted.

Prachi2812 commented 9 years ago

But what if I want to compare two folders?

mgrinzPlayer commented 9 years ago

preset store (all archivers I know, have this preset) or tar.

Prachi2812 commented 9 years ago

Could you please elaborate? I want to use xdelta on .tpk files

mgrinzPlayer commented 9 years ago

If you can, create .tpk files without compression/encryption.