jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.1k stars 184 forks source link

Unusual Delta output file size (Unsual comparing to the similar 'SmartVersion''s delta output file size) #203

Open HotDenim opened 9 years ago

HotDenim commented 9 years ago

Unusual Delta output file size. Using xdelta on the 'Visual Studio 2013' Retail editions .ISO images.

The .ISO files can be downloaded from here:

Ultimate Edition (2.9 GB): http://go.microsoft.com/fwlink/?LinkId=320679 Test Professional Edition (175 MB): http://download.microsoft.com/download/4/0/6/406E397F-EDBE-4437-B64F-40DF7A92A26E/VS2013_RTM_TESTPRO_ENU.iso

Premium Edition (2.9 GB): http://go.microsoft.com/fwlink/?LinkId=320676 Professional Edition (2.9 GB): http://go.microsoft.com/fwlink/?LinkId=320673

When creating a delta from Ultimate Edition to Test Professional Edition, the delta file size is 160 MB. I tried the same process with SmartVersion (www.smartversion.com) and it produces a delta file of size 12 MB (more realistic considering the differences within the .ISO file). The Test_Professional is a subset of Ultimate (Test_Professional .ISO is only 175 MB).

Can you explain ?. Surely xDelta should be producing something in the regions of what SmartVersion correctly produces ?, yes?.

(As Expected: creating a delta from 'Ultimate' to ,'Premium' or 'Professional' , creates small deltas of approx 600Kb)

mgrinzPlayer commented 9 years ago

When creating a delta from Ultimate Edition to Test Professional Edition, the delta file size is 160 MB

What parameters did you use?

HotDenim commented 9 years ago

I tried it with all, including the Defaults.. Here is one i tried it with also:

-e -9 -S lzma

mgrinzPlayer commented 9 years ago

In that case, you have to wait for 64bit hash version, that version also should have -B values greater than 2GB

Check this one https://github.com/jmacd/xdelta-devel/tree/64bithash from time to time.

Or maybe jmacd could implement offset option. Because, in your case, Test_Professional is a subset of Ultimate, but not the first 2GB of Ultimate.

For now, you can use jojodiff (download link)

HotDenim commented 9 years ago

Can you be more explicit / Verbose / comprehensive in your reply. As I am not very familiar with xDelta, just used it for one session.

In that case, you have to wait for 64bit hash version, that version also should have -B values greater than 2GB What is the effect of -B values greater than 2GB

Check this one https://github.com/jmacd/xdelta-devel/tree/64bithash What is this you are directing me to, and Why ?

Or maybe jmacd could implement offset option. What will that option do ?

For now, you can use jojodiff (download link) How do jojodiff comare to xdelta, in all ways ?. I thought xdelta was the best..

mgrinzPlayer commented 9 years ago

What is the effect of -B values greater than 2GB

https://github.com/jmacd/xdelta/blob/wiki/TuningMemoryBudget.md

What is this you are directing me to, and Why ?

Again, read TuningMemoryBudget. Bigger -B, smaller patches in general. Currently -B can not be bigger than 2GB, you have to wait...

What will that option do ?

It could skip the beginning of source file and load the rest.

How do jojodiff comare to xdelta, in all ways ?. I thought xdelta was the best.

xdelta is best. JojoDiff is good too, it's using a heuristic algorithm, accuracy is traded over speed. Not always finds the smallest set of differences, but, it doesn't require big buffers.

mgrinzPlayer commented 9 years ago

One extra example what bigger -B switch could do if jmacd implement it.

For now, we have to simulate it (it is not a perfect simulation). We can use 7zip for this. Add to archive "Ultimate" with 7zip, method: store, volume size: 1GB

You will get three files _VS2013_RTM_ULTENU.7z.001 (1GB) _VS2013_RTM_ULTENU.7z.002 (1GB) _VS2013_RTM_ULTENU.7z.003 (836MB)

Then this

xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.001 VS2013_RTM_TESTPRO_ENU.iso intermediate1.xd3
xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.002 intermediate1.xd3 intermediate2.xd3
xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.003 intermediate2.xd3 final.xd3

intermediate1.xd3 will be 67.5MB intermediate2.xd3 will be 29.7MB final.xd3 will be 12.3MB

Note, because this is not a perfect simulation, I think final.xd3 delta file size is much bigger than delta file created with newer xdelta3 (with big -B support) will be.

Then, you can delete intermediate1.xd3 and intermediate2.xd3 files. You will only need final.xd3 and those three parts.

To decode, do this:

xdelta3 -fvds VS2013_RTM_ULT_ENU.7z.003 final.xd3 intermediate2.xd3
xdelta3 -fvds VS2013_RTM_ULT_ENU.7z.002 intermediate2.xd3 intermediate1.xd3
xdelta3 -fvds VS2013_RTM_ULT_ENU.7z.001 intermediate1.xd3 otherVS2013_RTM_TESTPRO_ENU.iso

EDIT: Accidentally enabled 7zip compression. Now it is fixed. Reread this post again.

HotDenim commented 9 years ago

mgrinzPlayer:

A side question:

For creating the smallest Delta File, I have deduced that the following parameters are the 'best' ones:

-e -9 -S lzma -B 2147483648 -W 16777216 -I 0 -P 16777216

Am I correct ?, or can you suggest any other paramater/values for chances of a smaller delta file

mgrinzPlayer commented 9 years ago

It depends.

Personally, I'm using this

-9 -S none -B 2000000000 -I 0 -e -s oldfile newfile deltafile

then compress delta file with thirdparty tool like 7zip or FreeArc or WinRAR5

About previous simulation, I accidentally enabled compression in 7zip (mouse wheel changed 'store' to 'fastest'), Here are correct statistics:

You will get three files _VS2013_RTM_ULTENU.7z.001 (1GB) _VS2013_RTM_ULTENU.7z.002 (1GB) _VS2013_RTM_ULTENU.7z.003 (836MB)

Then this

xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.001 VS2013_RTM_TESTPRO_ENU.iso intermediate1.xd3
xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.002 intermediate1.xd3 intermediate2.xd3
xdelta3 -0 -I 0 -B 1000000000 -W 16777216 -fves VS2013_RTM_ULT_ENU.7z.003 intermediate2.xd3 final.xd3

intermediate1.xd3 will be 67.5MB intermediate2.xd3 will be 29.7MB final.xd3 will be 12.3MB

As you see, final.xd3 is about the same as SmartVersion or JoJoDiff deltas.

HotDenim commented 9 years ago

Why does it 'Depend' ?. Also can you see anything wrong/inadequte with my options and thier values?

mgrinzPlayer commented 9 years ago
-9 -S lzma -B 2000000000 -I 0 -e -s oldfile newfile deltafile

or

-9 -S lzma -B 2000000000 -W 16777216 -I 0 -P 16777216 -e -s oldfile newfile deltafile

Both are good for source files bigger than few hundreds megabytes.        As I said earlier, I prefer to use third party compression tool, so I don't use secondary compression (-S parameter).

-9 -S none -B 2000000000 -I 0 -e -s oldfile newfile deltafile

    For smaller files just use

 -9 -S lzma -I 0 -e -s oldfile newfile deltafile
jmacd commented 8 years ago

mgrinzPlayer, I appreciate your support!

I've been busy at work but find myself just now beginning a leave (parental--new child) and think I'll be able to find some time to get back to the 64bit hash changes.

But it'll be a couple of weeks at least before I find any time at all.

On Sat, Sep 5, 2015 at 12:54 PM, mgrinzPlayer notifications@github.com wrote:

-9 -S lzma -B 2000000000 -I 0 -e -s oldfile newfile deltafile

or

-9 -S lzma -B 2000000000 -W 16777216 -I 0 -P 16777216 -e -s oldfile newfile deltafile

Both are good for source files bigger than few hundreds megabytes.

As I said earlier, I prefer to use third party compression tool, so I don't use secondary compression (-S parameter).

-9 -S lzma -B 2000000000 -I 0 -e -s oldfile newfile deltafile

For smaller files just use

-9 -S lzma -I 0 -e -s oldfile newfile deltafile

— Reply to this email directly or view it on GitHub https://github.com/jmacd/xdelta/issues/203#issuecomment-137989163.

jmacd commented 8 years ago

Hi, As noted above, the root cause of the poor performance on your test case is issue 127, the lack of support for 64bit source buffer. That's fixed now.

I was able to verify that xdelta3 on the 64bithash branch computes a 12MB delta for the test case here, when configured with lzma secondary compression. Thank you for the test case. I'm not quite ready to release the changes, but will do so soon. Josh

jmacd commented 8 years ago

Here, that is: https://github.com/jmacd/xdelta-devel/tree/64bithash

jmacd commented 8 years ago

3.1.0 is released with this fix

HotDenim commented 8 years ago

Adds support for -B values greater than 2GB, enabled by -DXD3_USE_LARGESIZET=1 variable

How do I use the -B values greater than 2GB ?. Can you provide an example?

Also where can I find documentation for the command-line options (Updated documentation)

HotDenim commented 8 years ago

Jmacd (specifically):

Can you provide the command-line option values for the options -B -W -I -P options (and any other options) that would create the smallest delta file (before compression).

jmacd commented 8 years ago

I verified that this works with

./xdelta3 -B 4294967296 -vf -e -s ~/VS2013_RTM_ULT_ENU.iso ~/VS2013_RTM_TESTPRO_ENU.iso VS_ULT_TESTPRO.xdelta

xdelta3: secondary compression: lzma xdelta3: source /volume/home/jmacd/VS2013_RTM_ULT_ENU.iso source size 2.82 GiB [3024457728] blksize 4.00 GiB window 4.00 GiB (FIFO) xdelta3: 0: in 8.00 MiB: out 6.52 MiB: total in 8.00 MiB: out 6.52 MiB: 30 sec xdelta3: 1: in 8.00 MiB: out 4.61 MiB: total in 16.0 MiB: out 11.1 MiB: 2.8 sec xdelta3: 2: in 8.00 MiB: out 29.0 B: total in 24.0 MiB: out 11.1 MiB: 11 ms xdelta3: 3: in 8.00 MiB: out 117 KiB: total in 32.0 MiB: out 11.2 MiB: 199 ms xdelta3: 4: in 8.00 MiB: out 29.0 B: total in 40.0 MiB: out 11.2 MiB: 77 ms xdelta3: 5: in 8.00 MiB: out 29.0 B: total in 48.0 MiB: out 11.2 MiB: 74 ms xdelta3: 6: in 8.00 MiB: out 29.0 B: total in 56.0 MiB: out 11.2 MiB: 74 ms xdelta3: 7: in 8.00 MiB: out 29.0 B: total in 64.0 MiB: out 11.2 MiB: 66 ms xdelta3: 8: in 8.00 MiB: out 29.0 B: total in 72.0 MiB: out 11.2 MiB: 68 ms xdelta3: 9: in 8.00 MiB: out 29.0 B: total in 80.0 MiB: out 11.2 MiB: 58 ms xdelta3: 10: in 8.00 MiB: out 29.0 B: total in 88.0 MiB: out 11.2 MiB: 87 ms xdelta3: 11: in 8.00 MiB: out 29.0 B: total in 96.0 MiB: out 11.2 MiB: 82 ms xdelta3: 12: in 8.00 MiB: out 38.0 B: total in 104 MiB: out 11.2 MiB: 74 ms xdelta3: 13: in 8.00 MiB: out 45.0 B: total in 112 MiB: out 11.2 MiB: 103 ms xdelta3: 14: in 8.00 MiB: out 29.0 B: total in 120 MiB: out 11.2 MiB: 77 ms xdelta3: 15: in 8.00 MiB: out 40.0 B: total in 128 MiB: out 11.2 MiB: 99 ms xdelta3: 16: in 8.00 MiB: out 535 KiB: total in 136 MiB: out 11.8 MiB: 383 ms xdelta3: 17: in 8.00 MiB: out 29.0 B: total in 144 MiB: out 11.8 MiB: 76 ms xdelta3: 18: in 8.00 MiB: out 40.0 B: total in 152 MiB: out 11.8 MiB: 92 ms xdelta3: 19: in 8.00 MiB: out 29.0 B: total in 160 MiB: out 11.8 MiB: 61 ms xdelta3: 20: in 8.00 MiB: out 29.0 B: total in 168 MiB: out 11.8 MiB: 72 ms xdelta3: 21: in 3.40 MiB: out 29.0 B: total in 171 MiB: out 11.8 MiB: 52 ms xdelta3: finished in 54 sec; input 179724288 output 12340595 bytes (6.87%)

Note that this includes LZMA "secondary" compression, so you don't want to use an additional compression step.

Note also that I have bug reports of this new 64-bit support not working on Windows, or possibly this case works but some other cases don't even on Linux, OSX, etc. I'm investigating.

Unfortunately, the regression test I wrote for this depends on Linux system calls, so I'll have to work through that before I can run it on my Windows box.

On Sat, Jan 16, 2016 at 2:42 PM, HotDenim notifications@github.com wrote:

Can you provide the commandline options for -B -W -I -P options (and any other options) that would create the smallest delta file (before compression).

— Reply to this email directly or view it on GitHub.

HotDenim commented 8 years ago

Jmacd (specifically):

Adds support for -B values greater than 2GB, enabled by -DXD3_USE_LARGESIZET=1 variable

So you just specify a larger -B value ?, no need to set -DXD3_USE_LARGESIZET=1 variable ??

When I try i receive:

xdelta3: malloc: The access code is invalid.
xdelta3: out of memory: The access code is invalid.

Also: Can you provide the command-line option values for the options -B -W -I -P options (and any other options) that would create the smallest delta file (before compression). Not just something that 'works'

jmacd commented 8 years ago

The -DXD3_USE_LARGESIZET variable is set by default in the 3.1.0 distribution. That's a compiler flag.

Are you on Windows? I have to investigate the Windows issue. Which version are you testing?

Otherwise, with version 3.1.0 use a larger value of -B and it should work. There is not a great difference in compression due to the other variables, but you need -B set to 4 gigabytes for the example we're discussing, since the source file is greater than 2 gigabytes.

HotDenim commented 8 years ago

Yes Windows 7 -With Service Pack #1 - 64-Bit

Testing the 3.1.0 64-bit and 32-bit versions

Setting the -B value 1 Byte past 2GB (2147483648) causes the error

xdelta3: malloc: The access code is invalid.
xdelta3: out of memory: The access code is invalid.

There is not a great difference in compression due to the other variables, but you need -B set to 4 gigabytes for the example we're discussing, since the source file is greater than 2 gigabytes.

OK, but I am asking you what are, theoretically the the command-line option values for the options -B -W -I -P options (and any other options) that would create the smallest delta file (before compression). I mean what are the Maximum values each of these parameters can have ( I ask this assuming that the maximum gives the best environment for the smallest delta, theoretically).

jmacd commented 8 years ago

OK, so I have to diagnose several Windows issues it seems.

As for the parameters, -B is the only really important one, and we get best compression when -B is set to the size of the source file. It will be rounded-up to a power-of two, which is why when you set it one byte larger than 2GB, you get the problem we're seeing.

The -I, -P, and -W flags are not guaranteed to make better compression by arbitrarily raising their values. I recommend experimenting.

You probably shouldn't change -W, it has more to do with I/O performance than with compression.

The best compression for -I is -I=0. If you run with -vv you'll see warnings when the default setting isn't large enough.

I have some TODOs to look at the default settings for -W, -P, and -I. The defaults are roughly the same they were 10 years ago, but files and memories are larger these days.

HotDenim commented 8 years ago

1. So having -B even 1 byte larger (or if it is rounded to next power of 2, then one power of 2 larger) than size of Source file has no benefit ?, at all?.

2. With the 64-Bit .EXE: Even if I set -B to a power of 2, 2^32 (4294967296), I get same error. With the 32-Bit .EXE: Even if I set -B to a power of 2, 2^32 (4294967296), I get same error. BUT, the error message does not display. (Tested with Windows 7 SP1 64-Bit and Windows 10 Enterprise 32-Bit)

3. Has the -W parameters maximum value increased ?, I noted as having been 16777216 approx 2 years ago., now it seems it is 67108864.

4. Is the best value for -P, the same as the value for -W ?

5.

-vv

Is that an undocumented option ?, help output only shows -v option. (If so can you update help output)

6. Is xDelta's delta file, a custom file format for xDelta, or is it a general non xDelta file format ? (if so what is the file format called, and it's extension)

jmacd commented 8 years ago

On Sat, Jan 16, 2016 at 7:23 PM, HotDenim notifications@github.com wrote:

So having -B even 1 byte larger (or if it is rounded to next power of 2, then one power of 2 larger) than size of Source file has no benefit ?, at all?.

-B determines the size of a buffer that is used to read the Source file--and it needs to be a power-of-two, so yes, there is no benefit in making it any larger than the file size.

Even if I set -B to a power of 2, 2^32 (4294967296), I get same error.

There is a memory allocation problem. I forget that Windows needs a special API call to allocate > 2GB. I'll work on it, but it will be at least a week.

Has the -W parameters maximum value increased ?, I noted as having been 16777216 approx 2 years ago., now it seems it is 67108864.

Hm, I thought it was always 8MB. There are diminishing returns for larger windows.

Is the best value for -P, the same as the value for -W ?

Usually, but best to experiment.

— Reply to this email directly or view it on GitHub https://github.com/jmacd/xdelta/issues/203#issuecomment-172286789.

HotDenim commented 8 years ago

jmacd commented on Jan 18

There is a memory allocation problem. I forget that Windows needs a special API call to allocate > 2GB. I'll work on it, but it will be at least a week.

It has been long over a week since the problem was mentioned. What is the status?.

jmacd commented 8 years ago

I'm sorry I have no updates. In the intervening time, I worked on the (recently announced) license change and (just last evening, actually) installed Wine which I think will help me run my POSIX-only test harness against the Windows executable, and see if I can reproduce the problem.

HotDenim commented 7 years ago

...any more news on this ?... or will there be soon ?....


From: Joshua MacDonald notifications@github.com Sent: 03 May 2016 22:43 To: jmacd/xdelta Cc: HotDenim; Author Subject: Re: [jmacd/xdelta] Unusual Delta output file size (Unsual comparing to the similar 'SmartVersion''s delta output file size) (#203)

I'm sorry I have no updates. In the intervening time, I worked on the (recently announced) license change and (just last evening, actually) installed Wine which I think will help me run my POSIX-only test harness against the Windows executable, and see if I can reproduce the problem.

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/jmacd/xdelta/issues/203#issuecomment-216673835

HotDenim commented 6 years ago

still/again:

...any more news on this ?... or will there be soon ?....

(Been waiting many years for this).

khimru commented 4 years ago

Grabbed two ISO images of Visual Studio again: http://download.microsoft.com/download/5/7/A/57A99666-126E-42FA-8E70-862EDBADD215/vs2015.1.com_enu.iso http://download.microsoft.com/download/F/9/7/F9775608-F90B-4586-9337-E62671AE186D/vs2015.1.com_deu.iso

Delta between them is just 140MB! That's great... except "restored" file is ALSO 140MB... which is NOT so great.

This is on Linux, BTW.

gvollant commented 1 year ago

Now smartversion is opensource at https://github.com/gvollant/smartversion , so you'll be able to compare delta code