xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
https://bm25s.github.io
MIT License
862 stars 35 forks source link

Add weight mask that are applied to scores during retrieval #50

Closed xhluca closed 2 months ago

xhluca commented 2 months ago

This PR should close #35, as the new weight_mask allows the use of binary masks (see new tests) as well as any floating point mask, making it a general purpose weighting mechanism for post-scoring but before top-k candidate selection. At the moment only a single mask (of shape (D,) for D documents) is accepted, mainly due to infeasibility of having a (Q,D) mask when D is large. Future PRs can add support for 2-D masks.

xhluca commented 2 months ago

@dl423 FYI

dl423 commented 1 month ago

@xhluca Thank you! Really appreciate that you implemented this feature, sorry I didn't get around to implementing this.