hideaki-t / sqlite-fts-python

A Python binding of SQLite Full Text Search Tokenizer
MIT License
45 stars 11 forks source link

FTS5_TOKEN_COLOCATED #25

Open andersjo opened 3 years ago

andersjo commented 3 years ago

The FTS tokenizer API has the concept of "colocated" tokens where multiple tokens can occupy the same position in a sentence. The main use of this functionality is to implement synonyms (See Sec 7.1.1).

Is there any way to mark a token as colocated through the Python API?

macabrus commented 1 year ago

I believe the author thought of it, however, I haven't tested it.

xToken(pCtx, 0, "i",                      1,  0,  1);
xToken(pCtx, 0, "won",                    3,  2,  5);
xToken(pCtx, 0, "first",                  5,  6, 11);
xToken(pCtx, FTS5_TOKEN_COLOCATED, "1st", 3,  6, 11);
xToken(pCtx, 0, "place",                  5, 12, 17);

https://github.com/hideaki-t/sqlite-fts-python/blob/2808e9165d26e56e869fd633641fd29c2adce6f1/sqlitefts/fts5.py#L244

It should be possible, or even if it isn't yet, shouldn't be hard to implement. Will test it to see if it works and make PR if it doesn't.

EDIT: Remove docs link. sorry 😅, you already linked relevant section on SQLite docs website.