langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
52.38k stars 7.64k forks source link

Adding upstage layout analysis for parsing documents such as html, pdf, jpg... #6955

Open JuHyung-Son opened 3 months ago

JuHyung-Son commented 3 months ago

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

It is hard to index unstructured documents like html, pdfs. If unstructured document parsing api is available, it will very powerful tool for dify.

and upstage Layout analysis is world leading unstructured document parsing api. you can check its performance here.

https://en.content.upstage.ai/blog/business/introducing-layout-analysis https://developers.upstage.ai/docs/apis/layout-analysis

Also, this api is going to be opensourced.

2. Additional context or comments

No response

3. Can you help us with this feature?

JuHyung-Son commented 3 months ago

@crazywoola

I'm not sure how to add this feature, I thought Layout Analysis is an api so it should go in model_runtime, but according to the documentation it only supports models that can do the following tasks.

LLM - LLM text completion, dialogue, pre-computed tokens capability Text Embedding Model - Text Embedding, pre-computed tokens capability Rerank Model - Segment Rerank capability Speech-to-text Model - Speech to text capability Text-to-speech Model - Text to speech capability Moderation - Moderation capability

crazywoola commented 2 months ago

@JuHyung-Son

Hello, sorry for the late response.

I think you are looking for this, right?

https://github.com/langgenius/dify/blob/85fc0fdb51e38c1c9efae0d3556393b098d64853/docker/.env.example#L415-L422

And if you would like to contribute, you can fulfill something like below.

https://github.com/langgenius/dify/blob/e4f686deb71e59b8d36a6c31a5480a676f522a34/api/core/rag/extractor/unstructured/unstructured_xml_extractor.py

seungwoos commented 1 month ago

@JuHyung-Son

Hi, I just found that Upstage's Document Parse is way better than any other tools. Is there any plan for or progress in incorporating Document Parse into Dify?

taowang1993 commented 1 month ago

opensourced

Where is the open source repo?

I can't find it in their github account:

https://github.com/orgs/UpstageAI/repositories