paradedb / paradedb

Postgres for Search and Analytics
https://paradedb.com
GNU Affero General Public License v3.0
5.84k stars 165 forks source link

Introduce debug function to inspect Tantivy index #1499

Open rebasedming opened 9 months ago

rebasedming commented 9 months ago

What We're getting user reports of unexpected behavior in the search results. I believe this is because of some issue with the Tantivy index. Unfortunately right now the Tantivy index is opaque and its contents cannot be inspected once created.

Why Debugging

How Introduce a dump_bm25 function that takes in an index name and returns a table containing the contents of the index.

zedeus commented 9 months ago

I think I have a use case for this, or something similar, that isn't debugging. I want to be able to retrieve the keywords BM25 identifies for a given document, to use in a recommendation system, functionally being a kind of automated search. By tracking which documents the user likes, and thus which keywords the user is interested in, a weighted list of keywords can be randomly selected from to find potentially similar documents. From what I could find there doesn't seem to be a way to actually get the keywords "out of the index", and the "more like this" feature is for single documents, not a custom list of weighted keywords.

philippemnoel commented 8 months ago

I think I have a use case for this, or something similar, that isn't debugging. I want to be able to retrieve the keywords BM25 identifies for a given document, to use in a recommendation system, functionally being a kind of automated search. By tracking which documents the user likes, and thus which keywords the user is interested in, a weighted list of keywords can be randomly selected from to find potentially similar documents. From what I could find there doesn't seem to be a way to actually get the keywords "out of the index", and the "more like this" feature is for single documents, not a custom list of weighted keywords.

We have a PR for it, but it's waiting on our bigger refactor to come in. As soon as it's ready we'll let you know, it shouldn't be much longer.

philippemnoel commented 1 month ago

No longer relevant, closing

zedeus commented 1 month ago

@philippemnoel can you elaborate on why it's no longer relevant? Is there a way to achieve the use case I outlined now?

philippemnoel commented 1 month ago

We just haven't had a need for this in the last year, really. I'm not sure if we need this. @rebasedming thoughts?