infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22k stars 2.16k forks source link

[Question]: Issue while using csv with 1000 rows and 12 columns #2382

Open saineshwar opened 1 month ago

saineshwar commented 1 month ago

Describe your problem

Output

image

Data in .csv file

image

Download file to test - customers-1000.csv

KevinHuSh commented 1 month ago

And the issue is ...?

saineshwar commented 1 month ago

And the issue is ...?

  1. It is showing headers of csv file in answer.
  2. Data is not formatted

image

yingfeng commented 1 month ago
截屏2024-09-12 18 27 29

Choose Table as the parsing template, then I've got the above result. Forget about the Chinese characters got by LLMs.

One issue is that, the embedding process is pretty long. Since there are 1000 lines in the csv file, and each line will be sent to embedding model. We'll improve such strategy to enlarge the batch for Table like data.

saineshwar commented 1 month ago

截屏2024-09-12 18 27 29 Choose Table as the parsing template, then I've got the above result. Forget about the Chinese characters got by LLMs. One issue is that, the embedding process is pretty long. Since there are 1000 lines in the csv file, and each line will be sent to embedding model. We'll improve such strategy to enlarge the batch for Table like data.

Which embedding and LLM have you used for testing it.

yingfeng commented 1 month ago

It does not relate to the LLM/embedding closely. The key issue is you should choose Table instead of default General as the file parser template

saineshwar commented 1 month ago

It does not relate to the LLM/embedding closely. The key issue is you should choose Table instead of default General as the file parser template

Know i have chosen table only. image image

yingfeng commented 1 month ago

You could try to adjust the prompt. By default, demo adopts deepseek.

saineshwar commented 1 month ago

You could try to adjust the prompt. By default, demo adopts deepseek.

@yingfeng any example can you share please. I am using llama 3.1 as LLM