yanyiwu / cppjieba

"结巴"中文分词的C++版本
MIT License
2.57k stars 690 forks source link

cppjieba动态规划部分代码有bug #156

Open rookie-J opened 3 years ago

rookie-J commented 3 years ago

参考python版本计算动态规划的逻辑,将CalcDP函数的计算顺序改为从后往前就可以了。 可直接替换代码如下:

void CalcDP(vector& dags) const { size_t dagSize = dags.size(); size_t nextPos; const DictUnit* p; double val;

  for(size_t i = 0; i < dagSize; i++){
      dags[dagSize - i - 1].pInfo = NULL;
      dags[dagSize - i - 1].weight = MIN_DOUBLE;
      assert(!dags[dagSize - i - 1].nexts.empty());
      for(LocalVector<pair<size_t, const DictUnit*> >::const_iterator it = dags[dagSize - i - 1].nexts.begin();
          it != dags[dagSize - i - 1].nexts.end(); it++) {
          nextPos = it->first;
          p = it->second;
          val = 0.0;
          if(nextPos + 1 < dags.size()) {
              val += dags[nextPos + 1].weight;
          }

          if(p) {
              val += p->weight;
          } else {
              val += dictTrie_->GetMinWeight();
          }
          if(val > dags[dagSize - i - 1].weight) {
              dags[dagSize - i - 1].pInfo = p;
              dags[dagSize - i - 1].weight = val;
          }
      }
  }

}

moyu505 commented 3 years ago

解决的bug,能举例几个吗

dawnranger commented 1 year ago

作者本来用的就是从后往前遍历的逻辑(rbegin/rend) 见代码: https://github.com/yanyiwu/cppjieba/blob/master/include/cppjieba/MPSegment.hpp#L85