hideaki-t / sqlite-fts-python

A Python binding of SQLite Full Text Search Tokenizer
MIT License
45 stars 11 forks source link

Changes to bm25 #4

Closed coleifer closed 8 years ago

coleifer commented 8 years ago

Hi, I noticed that you used the bm25 implementation from peewee. I've made some improvements which you might be interested in. Now the function accepts weights for each column on the model. So if you've indexed a row with a title, content and tags column, you can specify that matches in the title are worth more than matches in the other fields.

You can find the code here.

I've also implemented these in Cython as a SQLite C extension, which you can find here.

I've similarly updated the simpler rank() function to accept weight values for the columns.

hideaki-t commented 8 years ago

Thank you, it looks great! I have to take a look :) I was trying to stick on ctypes or cffi, but yeah maybe it can have Cython implementation too.

coleifer commented 8 years ago

The changes are pretty minimal so I bet you can just modify the ctypes version w/o much hassle.

hideaki-t commented 8 years ago

I think only Python version ranking code is okay for now. the Python version works fine under not heavy load, I may add a C module for this though.