sisong / HDiffPatch

a C\C++ library and command-line tools for Diff & Patch between binary files or directories(folder); cross-platform; runs fast; create small delta/differential; support large files and limit memory requires when diff & patch.
Other
1.56k stars 287 forks source link

TSuffixString::lower_bound函数 #299

Closed wangchz13 closed 2 years ago

wangchz13 commented 2 years ago

作者您好,可以请教下TSuffixString::lower_bound函数的大概意思吗?看了半天看不懂...

TInt TSuffixString::lower_bound(const TChar* str,const TChar* str_end)const{
    //not use any cached range table
    //return m_lower_bound(m_cached_SA_begin,m_cached_SA_end,
    //                     str,str_end,m_src_begin,m_src_end,m_cached_SA_begin,0);
sisong commented 2 years ago

old数据建立好了后缀数组[m_cached_SA_begin,m_cached_SA_end), TSuffixString::lower_bound(const TChar str,const TChar str_end)的功能就是:查询查询字符串[str,str_end)在该后缀数组中的匹配位置。

wangchz13 commented 2 years ago

明白了,谢谢! 还可以解释下getBestMatch中matchDeep的意思吗?我看求出sai后并没有立即返回,后面的又看不懂了...

const TInt matchDeep = diffLimit?kMaxMatchDeepForLimit:2;
...
TInt sai=sstring.lower_bound(newData,newData_end);
...
for (TInt mdi= 0; mdi< matchDeep; ++mdi) {
        if (mdi&1){
            if (rightOk) continue;
        } else {
            if (leftOk) continue;
        }
        TInt i = sai + (1-(mdi&1)*2) * ((mdi+1)/2);
        if ((i<0)|(i>=(src_end-src_begin))) continue;
        TInt curOldPos=sstring.SA(i);
        if (diffLimit){
            if ((curOldPos>=kLimitOldPos)&(curOldPos<kLimitOldEnd))
...
sisong commented 2 years ago

用lower_bound得到最可能的匹配位置后,再在该位置的前后尝试寻找可能更好的匹配位置。 matchDeep变量用来控制尝试的最大次数。
diffLimit一般为null,但调用者可以用来对匹配位置或匹配长度进行一些特殊目的的限制。