shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
https://www.mulanai.com/product/corrector/
Apache License 2.0
5.61k stars 1.1k forks source link

Tokenizer类实现问题 #488

Closed liwb1219 closed 2 weeks ago

liwb1219 commented 7 months ago

169行的 if i + j > tokens_len: 这个无法扫描到句子最后的ngram 是不是应该修改为 if j > tokens_len:

shibing624 commented 7 months ago

哪个文件?

liwb1219 commented 7 months ago

pycorrector/pycorrector/utils/tokenizer.py