hideaki-t / sqlite-fts-python

A Python binding of SQLite Full Text Search Tokenizer
MIT License
45 stars 11 forks source link

Pre-release improvement #2

Open saaj opened 10 years ago

saaj commented 10 years ago

There're some changes that I think should be addressed before package Cheese-shop release.

Result ranking

There's no full-text result set ranking function out-of-the-box in SQLite. I think it makes sense to extent the scope of the package to address ranking as it is absolutely a topic of both "sqlite" and "fts".

All code is already out there. There's the article, even though it's about MIT-licensed package, peewee, the code can be easily extracted. Here's a gist with module and test case for it.

Because BM25 is a general language-independent ranking function its presence in the package makes it more complete.

Minimum documentation

README should be written to overview and cover basics. I can assist with it.

Also recipes for integration with tokenizers for major domains (CJK, Cyrillic, etc) is a good idea.

Minor

Underscore is undesired in a Python module name. I suggest to rename sqlite_tokenizer.py. "sqlite" part is the obvious context. tokenizer.py is better but not good anyway as it's not informative as the module doesn't provide real tokenizer per se, rather than a binding to register it. binding.py may be a better name, though you can try to coin a better one.

Make user symbols available from __init__.py so import sqlitefts is sufficient.

setup.py. url points to other package. "Operating System :: POSIX :: Linux" seems redundant with "Operating System :: OS Independent".

hideaki-t commented 10 years ago

Thanks,

Rnaking: I'll merge your implementation at gist. I'll add some test cases for CJK and other scoring functions Document: yes, I know I need to write it. I'll finish it. Minor: You're right. sqlitefts.sqlite_tokenizer is redundant. Let me think about it. the URL was copied from another package, I totally forgot to change it...