infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
22.27k stars 2.18k forks source link

[Bug]: Content disorder may occur during the parsing of PDF files. #744

Open johnmartinolli opened 6 months ago

johnmartinolli commented 6 months ago

Is there an existing issue for the same bug?

Branch name

v0.5.0

Commit ID

48607c3

Other environment information

No response

Actual behavior

图片 PMBOK第6版-中文.pdf

Expected behavior

The same as the original text content.

Steps to reproduce

1. Create a new knowledge and upload documents.
2. Wait for the parsing to be completed.
3. View the list of chunks.

Additional information

No response

KevinHuSh commented 6 months ago

We're gona refine it.