jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.09k stars 181 forks source link

xdelta3 encoding is too slow when generating a diff between two files which are more 10Gb of size #269

Open nachammaichidambaram opened 11 months ago

nachammaichidambaram commented 11 months ago

xdelta3 encoding is too slow when generating a diff between two files which are more 10Gb of size. I tried all the options like increasing the bytes size using -B changing the compression level -1 and modifying -S djw . Using fastest machines with gpu support.

mgrinzPlayer commented 11 months ago

xdelta3 does not utilize GPU. It uses only one CPU thread. The most CPU time consuming are: secondary compression, then internal compression. If you are planning compressing diff files with external compressor like rar,7z,xz,etc better disable secondary compression with -S none.

On my old test machine, first line takes 60 seconds, second line 190 sec, pak file size around 10GB:

xdelta3.exe -v -e -0 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST1.patch
xdelta3.exe -v -e -9 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST2.patch

TEST1.patch size = ~90MB, compressed with 7z(mx9,dict512MB, 1thread) = 35MB TEST2.patch size = ~38MB, compressed with 7z(mx9,dict512MB, 1thread) = 24.5MB

nachammaichidambaram commented 11 months ago

Hello Jmacd,

Thanks for your reply Iam using xdelta3 in linux machine trying to diff tar files

I tried the xdelta3 command with -S none option but still it uses lzma for secondary compression can you help on this please

Also let me know any command line options to be used for decoding

Thanks, Nachammai

On Thu, 10 Aug 2023 at 8:43 PM, mgrinzPlayer @.***> wrote:

xdelta3 does not utilize GPU. It uses only one CPU thread. The most CPU time consuming are: secondary compression, then internal compression. If you are planning compressing diff files with external compressor like rar,7z,xz,etc better disable secondary compression with -S none.

On my old test machine, first line takes 60 seconds, second line 190 sec, pak file size around 10GB:

xdelta3.exe -v -e -0 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST1.patch xdelta3.exe -v -e -9 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST2.patch

TEST1.patch size = ~90MB, compressed with 7z(mx9,dict512MB, 1thread) = 35MB TEST2.patch size = ~38MB, compressed with 7z(mx9,dict512MB, 1thread) = 24.5MB

— Reply to this email directly, view it on GitHub https://github.com/jmacd/xdelta/issues/269#issuecomment-1673418909, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2LNBJFMLGLJF2BOLSDLY7TXUT3CFANCNFSM6AAAAAA3KFFPPY . You are receiving this because you authored the thread.Message ID: @.***>

nachammaichidambaram commented 11 months ago

Currently decoding gives non-seekable source in decode: XD3_INTERNAL command which I used for decoding

Xdelta3 -S none -d -f -R -s source.tar patch target.tar

The version which Iam using is 3.1.0

If there is a latest version available with a fix for this , please let me know

Thanks, Nachammai

On Thu, 10 Aug 2023 at 9:08 PM, Nachammai Chidambaram < @.***> wrote:

Hello Jmacd,

Thanks for your reply Iam using xdelta3 in linux machine trying to diff tar files

I tried the xdelta3 command with -S none option but still it uses lzma for secondary compression can you help on this please

Also let me know any command line options to be used for decoding

Thanks, Nachammai

On Thu, 10 Aug 2023 at 8:43 PM, mgrinzPlayer @.***> wrote:

xdelta3 does not utilize GPU. It uses only one CPU thread. The most CPU time consuming are: secondary compression, then internal compression. If you are planning compressing diff files with external compressor like rar,7z,xz,etc better disable secondary compression with -S none.

On my old test machine, first line takes 60 seconds, second line 190 sec, pak file size around 10GB:

xdelta3.exe -v -e -0 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST1.patch xdelta3.exe -v -e -9 -S none -I 0 -B 1073741824 -s file.pak file.pak.NEW fileTEST2.patch

TEST1.patch size = ~90MB, compressed with 7z(mx9,dict512MB, 1thread) = 35MB TEST2.patch size = ~38MB, compressed with 7z(mx9,dict512MB, 1thread) = 24.5MB

— Reply to this email directly, view it on GitHub https://github.com/jmacd/xdelta/issues/269#issuecomment-1673418909, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2LNBJFMLGLJF2BOLSDLY7TXUT3CFANCNFSM6AAAAAA3KFFPPY . You are receiving this because you authored the thread.Message ID: @.***>

mgrinzPlayer commented 11 months ago

I'm not jmacd.

The version which Iam using is 3.1.0

From where? Did you compile it? In this repo, since this commit https://github.com/jmacd/xdelta/commit/e1a0a2538ff9c5a7350562f844c67bfd639fa3f5, LZMA is not the default secondary compression.

anyway, decoding, first line should be good for most files, second line for bigger files (from "--help" "-B bytes source window size", also bigger buffers, sometimes it can speed up patching process)

xdelta3.exe -v -d -s file.pak fileTEST1.patch file.pak.NEW
xdelta3.exe -v -d -B 1073741824 -s file.pak fileTEST1.patch file.pak.NEW

edit: Suggestion. Try getting older xdelta versions, I would try 3.0.7 and 3.0.8. Maybe those versions will be enough for your needs.