infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.11k stars 2.17k forks source link

[Bug]: Chat retrieval issues #2362

Closed sznariOsmosis closed 1 month ago

sznariOsmosis commented 1 month ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

main

Other environment information

No response

Actual behavior

The debugging discovery is currently based on the latest 4 problems in the history to retrieve, and the debugging found that the consequence of this is that sometimes the content cannot be retrieved, but when I use the latest problem to retrieve, I can get the retrieved chunk

Expected behavior

What was the purpose of using the 4 questions to search in the first place, and will this be a hidden danger? 61726023665_ pic

Steps to reproduce

Attachments shown above

Additional information

No response

KevinHuSh commented 1 month ago

It's normal. The longer the query is, the lower the recall rate is.

sznariOsmosis commented 1 month ago

It's normal. The longer the query is, the lower the recall rate is.

What is the purpose of this strategy, after testing, it uses the 4 questions of the history to retrieve chunks, which is easy to cause false detections and missed detections,Because I noticed that the code defaults to a new question composed of 3 questions from the current problem and the historical question, which are concatenated by strings

KevinHuSh commented 1 month ago

It's related to multi-turn conversations. We're gona refine it soon.