infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
23.1k stars 2.26k forks source link

[Question]: Has the new knowledge base switched to idxnm? #3463

Open 1556900941lizerui opened 2 hours ago

1556900941lizerui commented 2 hours ago

Describe your problem

When creating a new knowledge base, will a new idxnm be used? I found in search.py that the program will use filter to exclude the specified kb_ids. By consulting the materials, I found that when calculating similarity scores using keywords, the entire index's docFreq and termFreq are still used for score calculation. Is this problematic?

1556900941lizerui commented 2 hours ago

Describe your problem

When creating a new knowledge base, will a new idxnm be used? I found in search.py that the program will use filter to exclude the specified kb_ids. By consulting the materials, I found that when calculating similarity scores using keywords, the entire index's docFreq and termFreq are still used for score calculation. Is this problematic?

I'm sorry, I'm using the version released at the time. I checked the latest code and found that now it will combine idx and kb_id into a new table and use a table-level replacement for the original scheme. Is this different from the original approach, and will the situation I mentioned above still occur?