langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.36k stars 6.38k forks source link

Adding upstage layout analysis for parsing documents such as html, pdf, jpg... #6955

Open JuHyung-Son opened 1 month ago

JuHyung-Son commented 1 month ago

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

It is hard to index unstructured documents like html, pdfs. If unstructured document parsing api is available, it will very powerful tool for dify.

and upstage Layout analysis is world leading unstructured document parsing api. you can check its performance here.

https://en.content.upstage.ai/blog/business/introducing-layout-analysis https://developers.upstage.ai/docs/apis/layout-analysis

Also, this api is going to be opensourced.

2. Additional context or comments

No response

3. Can you help us with this feature?

JuHyung-Son commented 1 month ago

@crazywoola

I'm not sure how to add this feature, I thought Layout Analysis is an api so it should go in model_runtime, but according to the documentation it only supports models that can do the following tasks.

LLM - LLM text completion, dialogue, pre-computed tokens capability Text Embedding Model - Text Embedding, pre-computed tokens capability Rerank Model - Segment Rerank capability Speech-to-text Model - Speech to text capability Text-to-speech Model - Text to speech capability Moderation - Moderation capability

crazywoola commented 1 week ago

@JuHyung-Son

Hello, sorry for the late response.

I think you are looking for this, right?

https://github.com/langgenius/dify/blob/85fc0fdb51e38c1c9efae0d3556393b098d64853/docker/.env.example#L415-L422

And if you would like to contribute, you can fulfill something like below.

https://github.com/langgenius/dify/blob/e4f686deb71e59b8d36a6c31a5480a676f522a34/api/core/rag/extractor/unstructured/unstructured_xml_extractor.py