In this PR, we add support for Numba's no-python JIT compiling, allowing substantial speedup. For example, we went from 41 queries/s for NQ to 91.83 q/s (see bm25-benchmark).

Changes

We added an option to use a numba backend for topk selection when you retrieve text. Simply use retriever.retrieve(... backend_selection="numba") to activate it.
We changed how the relevance score is computed, to make it faster by default and even faster when numba is used. You can now use retriever.activate_numba_scorer() to enable numba
New tests for numba: tests/numba/test_topk_numba.py
New example using numba: examples/retrieve_with_numba.py

Detailed notes

New scoring approaches (numba ready)

You can find the function _compute_relevance_from_scores_legacy in bm25s/scoring.py to see how the old scoring worked. We now also have a _compute_relevance_from_scores_jit_ready which is an alternative to the legacy and default relevance scoring function, which is slow out of the box but can be muich faster when we call numba.njit(_compute_relevance_from_scores_jit_ready). Moreover, our default relevance scoring function is now faster than the legacy approach, and has been moved directly to the main BM25 class as a staticmethod called _compute_relevance_from_scores. That can be overwritten to use your custom function, such as _compute_relevance_from_scores_jit_ready or _compute_relevance_from_scores_legacy.

New selection algorithm powered by numba (`topk`)

We created a bm25s.numba.selection module that can be imported only when numba is available, and offers a topk function that behaves mostly the same as bm25s.selection.topk (only difference might be that some of the order of retrieved documents differ if they have the same score). It is automatically selected when backend_selection="numba" is selected)

xhluca / bm25s

Add numba integration to allow for faster scoring and retrieval #41

Changes

Detailed notes

New scoring approaches (numba ready)

New selection algorithm powered by numba (`topk`)

xhluca / bm25s

Add numba integration to allow for faster scoring and retrieval #41

Changes

Detailed notes

New scoring approaches (numba ready)

New selection algorithm powered by numba (topk)

New selection algorithm powered by numba (`topk`)