Open saaj opened 10 years ago
Thanks,
Rnaking: I'll merge your implementation at gist. I'll add some test cases for CJK and other scoring functions Document: yes, I know I need to write it. I'll finish it. Minor: You're right. sqlitefts.sqlite_tokenizer is redundant. Let me think about it. the URL was copied from another package, I totally forgot to change it...
There're some changes that I think should be addressed before package Cheese-shop release.
Result ranking
There's no full-text result set ranking function out-of-the-box in SQLite. I think it makes sense to extent the scope of the package to address ranking as it is absolutely a topic of both "sqlite" and "fts".
All code is already out there. There's the article, even though it's about MIT-licensed package,
peewee
, the code can be easily extracted. Here's a gist with module and test case for it.Because BM25 is a general language-independent ranking function its presence in the package makes it more complete.
Minimum documentation
README should be written to overview and cover basics. I can assist with it.
Also recipes for integration with tokenizers for major domains (CJK, Cyrillic, etc) is a good idea.
Minor
Underscore is undesired in a Python module name. I suggest to rename
sqlite_tokenizer.py
. "sqlite" part is the obvious context.tokenizer.py
is better but not good anyway as it's not informative as the module doesn't provide real tokenizer per se, rather than a binding to register it.binding.py
may be a better name, though you can try to coin a better one.Make user symbols available from
__init__.py
soimport sqlitefts
is sufficient.setup.py
. url points to other package. "Operating System :: POSIX :: Linux" seems redundant with "Operating System :: OS Independent".