infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.57k stars 2.21k forks source link

[Question]: Answering across multiple documents #2400

Open CiaraRichmond opened 2 months ago

CiaraRichmond commented 2 months ago

Describe your problem

I am looking for a way to query across multiple document chunks. I have a CSV sample ( employee_data.csv) which has information at an employee level which details the employee and their office location. When this is embedded using the table format, I can ask the Q&A bot about the office location of individual employees, though it is struggling to pull back the relevant information when a question is asked at an office level. Below you can see that it retrieves accurate information about some of the employees at the London office:

image

But actually we have over 300 employees at that office:

image

Is there currently a best practice for how to handle this type of query which requires a larger subset of the document chunks to be able to fully answer the question?

JinHai-CN commented 2 months ago

Which LLM are you using?

CiaraRichmond commented 2 months ago

Hi, this assistant is currently using llama3.1

JinHai-CN commented 2 months ago

We recommend that you use GPT4 to generate your answers, in this case. We have used other LLM before, but the results were the same as this scenario, with a lot of content lost.

CiaraRichmond commented 2 months ago

Ok, I was under the impression that this was a failure of the retrieval not the LLM which generates the answer. If my CSV is chunked into rows then is there any way the retrieval can bring back over 300 pieces of relevant chunks (a chunk for each employee)?

KevinHuSh commented 2 months ago

Excel sheet will be parsed as a html table. So, theoretically it will retrieval the entire table back. You can check it by click this bulb. image Otherwise, you can try parse it by 'Table' method.