xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
https://bm25s.github.io
MIT License
862 stars 35 forks source link

Thread safe search #37

Closed okhat closed 2 months ago

okhat commented 2 months ago

Amazing work on this! Works great.

Is retrieval thread-safe? On a glance, it seems like it should be, but I have trouble using multi-threading in a notebook. It crashes most of the time, but when it works the results are correct.

I should add that I have trouble irrespective of backend = jax or numpy.

xhluca commented 2 months ago

That's interesting! I didn't run into issues when running 4T on kaggle. I have not tried running local notebooks, though I believe kaggle is very close to pure notebooks

Would it be possible to share a notebook where it would crash alongside steps to reproduce it?

okhat commented 2 months ago

I'm even more impressed by BM25s upon discovering that the crashing is coming from the evaluation library I'm using and not the search library (BM25s). It would seem that having the latter be thread-safe should be much harder than the former. Excellent work, @xhluca !

xhluca commented 2 months ago

Thank you for the kind words, really appreciate it! I think although the multi threaded implementation is working, I think it is possible to make it more efficient. It will definitely be an important future work.