sisong / HDiffPatch

a C\C++ library and command-line tools for Diff & Patch between binary files or directories(folder); cross-platform; runs fast; create small delta/differential; support large files and limit memory requires when diff & patch.
Other
1.52k stars 280 forks source link

Very slow patch generation. #359

Closed BeastBurst closed 1 year ago

BeastBurst commented 1 year ago

I am running HDiffPatch on MacBook M2 Max have this settings:

hdiff_TCompress*        compressPlugin=0;
TDiffSets diffSets;
memset(&diffSets,0,sizeof(diffSets));
diffSets.patchStepMemSize = 1024*1024*1024*2;

if (_IS_NEED_BSDIFF)

diffSets.isBsDiff = _kNULL_VALUE;

endif

if (_IS_NEED_VCDIFF)

diffSets.isVcDiff = _kNULL_VALUE;

endif

diffSets.isDoDiff =_kNULL_VALUE;
diffSets.isDoPatchCheck=_kNULL_VALUE;
diffSets.isDiffInMem   =_kNULL_VALUE;
diffSets.isSingleCompressedDiff =_kNULL_VALUE;
diffSets.isUseBigCacheMatch = 1;
diffSets.matchBlockSize=_kNULL_SIZE;
diffSets.threadNum=8;
diffSets.threadNumSearch_s=_THREAD_NUMBER_NULL;

This seems to make a 48MB patch file in about 26s which feels slow.

Can you suggest how can I improve the speed?
sisong commented 1 year ago

For test the speed, you can try $diffz -SD-2m -p-8 -m-6 oldFile newFile outPatFile
or $diffz -SD-2m -p-8 -s-64 oldFile newFile outPatFile (Note: The default patchFile is not compressed.)
If feasible, it is recommended to use the command line program directly.

'patchStepMemSize' does not need 2GB, 2MB is generally sufficient;
If you call the API yourself, it is recommended to use 'create_single_compressed_diff_block(const hpatch_TStreamInput* newData,...)' and 'patch_single_stream()' ;
filename to 'hpatch_TStreamInput' can use 'hpatch_TFileStreamInput';

If the data is very large, you can use other APIs, which are also faster: 'create_single_compressed_diff_stream()' and 'patch_single_stream()';
Try different 'kMatchBlockSize' values to see what happens.