jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.09k stars 181 forks source link

Are "too small copy" instructions hurting and making patches bigger? #247

Open Apollon77 opened 6 years ago

Apollon77 commented 6 years ago

We want to use xdelta3 in a project and so we did a deeper check and analysis. Wen are using the 3.0.11 (official apt-get package)

While looking into it I saw often patchdelta infos like:

VCDIFF window number:         3
VCDIFF window indicator:      VCD_SOURCE VCD_ADLER32 
VCDIFF adler32 checksum:      B31ECAEE
VCDIFF delta indicator:       VCD_DATACOMP VCD_INSTCOMP VCD_ADDRCOMP 
VCDIFF window at offset:      25165824
VCDIFF copy window length:    36020918
VCDIFF copy window offset:    37716
VCDIFF delta encoding length: 4197542
VCDIFF target window length:  8388608
VCDIFF data section length:   4195146
VCDIFF inst section length:   1274
VCDIFF addr section length:   1020
  Offset Code Type1 Size1  @Addr1 + Type2 Size2 @Addr2
  25165824 035  CPY_1 2968039 S@25165824
  28133863 007  ADD        6        
  28133869 026  CPY_0     10 S@2760901
  28133879 001  ADD     4466        
  28138345 057  CPY_2      9 S@31045541
  28138354 001  ADD     9823        
  28148177 090  CPY_4     10 S@32596064
  28148187 001  ADD     7306        
  28155493 057  CPY_2      9 S@29211145
  28155502 001  ADD     4437        
  28159939 029  CPY_0     13 S@1271504
  28159952 001  ADD     2471        
  28162423 027  CPY_0     11 S@1127813
  28162434 001  ADD     6414        
  28168848 025  CPY_0      9 S@1007985
  28168857 001  ADD      752        
  28169609 059  CPY_2     11 S@31337285
  28169620 001  ADD    27319        
  28196939 041  CPY_1      9 S@30920731
  28196948 001  ADD    60600        
  28257548 074  CPY_3     10 S@30920731
  28257558 001  ADD    12838        
  28270396 043  CPY_1     11 S@27632709
  28270407 001  ADD    14398        
  28284805 022  CPY_0      6 S@1858265
  28284811 070  CPY_3      6 S@32651812
  28284817 001  ADD    24536        
  28309353 025  CPY_0      9 S@1771881
  28309362 001  ADD     3512        
  28312874 073  CPY_3      9 S@32702409
  28312883 001  ADD     6303        
  28319186 039  CPY_1      7 S@28287497
  28319193 038  CPY_1      6 S@27741851
  28319199 001  ADD     2755        
  28321954 057  CPY_2      9 S@31038611
  28321963 001  ADD    10040        
  28332003 025  CPY_0      9 S@6321249
  28332012 001  ADD     9655        
  28341667 105  CPY_5      9 S@21944423
  28341676 001  ADD    10002        
  28351678 073  CPY_3      9 S@29757211
  28351687 001  ADD     2871        
  28354558 075  CPY_3     11 S@30948233
  28354569 001  ADD    20178        
  28374747 074  CPY_3     10 S@29927226
  28374757 001  ADD     9256        
  28384013 090  CPY_4     10 S@33971623
  28384023 001  ADD     7457        
  28391480 026  CPY_0     10 S@6328307
  28391490 001  ADD    17983        
  28409473 041  CPY_1      9 S@29754509
  28409482 001  ADD     7224        

How big an "copy data instruction" is encoded? I read 28 bytes per instruction somewhere? But even if smaller ... Is it really "less" to encode a "copy 9/10/11 bytes" instead of just also have them in the "ADD" block? (my understanding is that CPY means that soure/target content is equal, correct?)

We experimented with -I ... no effect at all We expericmented with compression level and had only a small benefit.

Any idea? Thank you!