Open muzhi1991 opened 2 months ago
Firstly, vector search is computational costly, that's what the filter is for. Secondly, we noticed that vector search is not precise enough, I guess, that's why google primarily use keyword search instead of vector search.
Firstly, vector search is computational costly, that's what the filter is for. Secondly, we noticed that vector search is not precise enough, I guess, that's why google primarily use keyword search instead of vector search.
Thanks for your reply. In most current RAG solutions, hybrid search is generally a combination of traditional keyword search and vector retrieval (ANN), which are performed simultaneously, and then a fusion algorithm such as Reciprocal Rank Fusion (RRF) is used. Using keyword filtering as a precondition does not seem to be optimal,Especially when the user's query and document are very different
Describe your problem
I noticed that es was used as a hybrid search solution (combining keyword search and vector search) in the project. In a simple test case, I found that es often failed to recall any results. When I read the code, I found that the vector search here used the query filter (the prerequisite of 60% keyword hits), which seemed to weaken the effect of the vector search. Why did you consider doing this? Or did I misunderstand?
https://github.com/infiniflow/ragflow/blob/fdd5b1b8cf58e3808cb3d47fd0731be40fc32d97/rag/nlp/search.py#L132