jingfelix / EasySearch

Apache License 2.0
1 stars 1 forks source link

parse pdf to sentence #7

Closed Leizhenpeng closed 1 year ago

Leizhenpeng commented 1 year ago

先读取所有的文字,合并在一起、 利用基于spacy对中文训练的模型去把分割出句子 考虑到补全效率,不足20个字的自动合并

jingfelix commented 1 year ago

LGTM!