Open rebasedming opened 9 months ago
I think I have a use case for this, or something similar, that isn't debugging. I want to be able to retrieve the keywords BM25 identifies for a given document, to use in a recommendation system, functionally being a kind of automated search. By tracking which documents the user likes, and thus which keywords the user is interested in, a weighted list of keywords can be randomly selected from to find potentially similar documents. From what I could find there doesn't seem to be a way to actually get the keywords "out of the index", and the "more like this" feature is for single documents, not a custom list of weighted keywords.
I think I have a use case for this, or something similar, that isn't debugging. I want to be able to retrieve the keywords BM25 identifies for a given document, to use in a recommendation system, functionally being a kind of automated search. By tracking which documents the user likes, and thus which keywords the user is interested in, a weighted list of keywords can be randomly selected from to find potentially similar documents. From what I could find there doesn't seem to be a way to actually get the keywords "out of the index", and the "more like this" feature is for single documents, not a custom list of weighted keywords.
We have a PR for it, but it's waiting on our bigger refactor to come in. As soon as it's ready we'll let you know, it shouldn't be much longer.
No longer relevant, closing
@philippemnoel can you elaborate on why it's no longer relevant? Is there a way to achieve the use case I outlined now?
We just haven't had a need for this in the last year, really. I'm not sure if we need this. @rebasedming thoughts?
What We're getting user reports of unexpected behavior in the search results. I believe this is because of some issue with the Tantivy index. Unfortunately right now the Tantivy index is opaque and its contents cannot be inspected once created.
Why Debugging
How Introduce a
dump_bm25
function that takes in an index name and returns a table containing the contents of the index.