Open CiaraRichmond opened 2 months ago
Which LLM are you using?
Hi, this assistant is currently using llama3.1
We recommend that you use GPT4 to generate your answers, in this case. We have used other LLM before, but the results were the same as this scenario, with a lot of content lost.
Ok, I was under the impression that this was a failure of the retrieval not the LLM which generates the answer. If my CSV is chunked into rows then is there any way the retrieval can bring back over 300 pieces of relevant chunks (a chunk for each employee)?
Excel sheet will be parsed as a html table. So, theoretically it will retrieval the entire table back. You can check it by click this bulb. Otherwise, you can try parse it by 'Table' method.
Describe your problem
I am looking for a way to query across multiple document chunks. I have a CSV sample ( employee_data.csv) which has information at an employee level which details the employee and their office location. When this is embedded using the table format, I can ask the Q&A bot about the office location of individual employees, though it is struggling to pull back the relevant information when a question is asked at an office level. Below you can see that it retrieves accurate information about some of the employees at the London office:
But actually we have over 300 employees at that office:
Is there currently a best practice for how to handle this type of query which requires a larger subset of the document chunks to be able to fully answer the question?