langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.42k stars 6.55k forks source link

Change the default PDF parser to Unstructured PDF Partitioner #8751

Open taowang1993 opened 4 hours ago

taowang1993 commented 4 hours ago

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

The Unstructured PDF Partitioner is a more advanced approach for parsing PDFs.

It can extract tables and images from PDFs.

I propose changing the default parser to Unstructured approach.

https://docs.unstructured.io/open-source/core-functionality/partitioning#partition-pdf

2. Additional context or comments

No response

3. Can you help us with this feature?

crazywoola commented 4 hours ago

Link #8695