hideaki-t / sqlite-fts-python

A Python binding of SQLite Full Text Search Tokenizer
MIT License
46 stars 11 forks source link

respect specifed text length in FTS3 tokenizer #38

Closed hideaki-t closed 9 months ago

hideaki-t commented 9 months ago

it parsed beyond it was asked then it returned something not asked.

e.g.

"binding" OR "あいうえお"

SQLite3 opens the tokenizer twice. one for binding and the other for あいうえお by shifting the start pointer and limiting input length.

"binding" OR "あいうえお"
 * from here, length 7 bytes
"binding" OR "あいうえお"
              * from here and length 15bytes (3bytes char * 5)

but this package parsed the input text entierly, thus generated too many token including OR.

fixed GH-36