sisong / HDiffPatch

a C\C++ library and command-line tools for Diff & Patch between binary files or directories(folder); cross-platform; runs fast; create small delta/differential; support large files and limit memory requires when diff & patch.
Other
1.52k stars 280 forks source link

create_diff() 生成的 diff 文件,使用 patch_stream_with_cache() 无法生成新文件,求问大佬正确的使用姿势是什么~ #360

Closed Candyerer closed 11 months ago

Candyerer commented 11 months ago

生成 diff 的代码: hpatch_TFileStreamInput old_stream; hpatch_TFileStreamInput new_stream; hpatch_TFileStreamOutput out_diff_stream;

hpatch_TFileStreamInput_init(&old_stream);
hpatch_TFileStreamInput_init(&new_stream);
hpatch_TFileStreamOutput_init(&out_diff_stream);

CheckAndAssert(hpatch_TFileStreamInput_open(&old_stream, old_file_path.c_str()), "open old file error");
CheckAndAssert(hpatch_TFileStreamInput_open(&new_stream, new_file_path.c_str()), "open new file error");
CheckAndAssert(hpatch_TFileStreamOutput_open(&out_diff_stream, output_file_path.c_str(), (hpatch_StreamPos_t) (-1)),
               "open diff file error");
hpatch_TFileStreamOutput_setRandomOut(&out_diff_stream, true);

TByte old_data[old_stream.base.streamSize];
TByte new_data[new_stream.base.streamSize];
std::vector<TByte> out_diff_data;
out_diff_data.reserve(new_stream.base.streamSize * 2);

old_stream.base.read(&old_stream.base, 0, old_data, old_data + old_stream.base.streamSize);
new_stream.base.read(&new_stream.base, 0, new_data, new_data + new_stream.base.streamSize);

const size_t old_data_size = sizeof(old_data) / sizeof(TByte);
const size_t new_data_size = sizeof(new_data) / sizeof(TByte);

create_diff(new_data, new_data + new_data_size, old_data, old_data + old_data_size, out_diff_data);

CheckAndAssert(out_diff_stream.base.write(&out_diff_stream.base, 0, out_diff_data.data(),
                                          out_diff_data.data() + out_diff_data.size()), "write diff file error");

CheckAndAssert(check_diff(new_data, new_data + new_data_size, old_data,old_data + old_data_size, 
out_diff_data.data(),out_diff_data.data() + out_diff_data.size()), "check diff error");

生成 patch 的代码: hpatch_TFileStreamInput old_stream; hpatch_TFileStreamInput diff_stream; hpatch_TFileStreamOutput output_stream;

    hpatch_TFileStreamInput_init(&old_stream);
    hpatch_TFileStreamInput_init(&diff_stream);
    hpatch_TFileStreamOutput_init(&output_stream);

    CheckAndAssert(hpatch_TFileStreamInput_open(&old_stream, old_file_path.c_str()), "open old file error");
    CheckAndAssert(hpatch_TFileStreamInput_open(&diff_stream, diff_file_path.c_str()), "open diff file error");
    CheckAndAssert(
            hpatch_TFileStreamOutput_open(&output_stream, output_file_path.c_str(), (hpatch_StreamPos_t) (-1)),
            "open output file error");
    hpatch_TFileStreamOutput_setRandomOut(&output_stream, true);

    TByte temp_cache[hpatch_kStreamCacheSize * 4];
    patch_stream_with_cache(&(output_stream.base), &(old_stream.base), &(diff_stream.base), temp_cache,
                            temp_cache + sizeof(temp_cache) / sizeof(TByte));

patch 得到的新文件 size = 0,似乎是哪里出了问题,在 check_diff 的时候,通过一次性将新旧文件的内容全部读入内存验证过 patch 文件的正确性,但是通过 patch_stream_with_cache 来使用 patch 得到新文件就不太对了。

sisong commented 11 months ago

你需要自己跟踪运行过程看看参数是否符合预期。
TByte old_data[old_stream.base.streamSize]; 这种文件大一点是否就会出错?
可能这两句有问题?可能无法返回正确的大小吧
const size_t old_data_size = sizeof(old_data) / sizeof(TByte);
const size_t new_data_size = sizeof(new_data) / sizeof(TByte);
改成这样试试:
const size_t old_data_size = old_stream.base.streamSize;
const size_t new_data_size = new_stream.base.streamSize;

patch 得到的新文件 size = 0 跟踪看看 output_stream.out_length 的值对不对

hpatch_kStreamCacheSize*4 建议换成 hpatch_kFileIOBufBetterSize*4

Candyerer commented 11 months ago

我测试发现使用 patch_stream_with_cache() 接口的时候传入的 output_stream 必须输入预期的合成的新文件的正确 size 才行~ 如果 output_stream.base.streamSize = 0 or = (unsigned long long)(-1) 的话,就无法 patch 得到正确的新文件,请问您我的测试结果是正确的嘛~

sisong commented 11 months ago

好像当时是这样设计的,为了安全,要求patch前调用者必须提前得到new文件占用的大小(比如检查剩余磁盘空间是否完全足够,以避免写死机器)。