langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
44.56k stars 6.25k forks source link

When setting ETL_TYPE to Unstructured, the local docker runs and reports an error when creating the knowledge base. #6872

Closed biofer closed 1 month ago

biofer commented 1 month ago

Self Checks

Dify version

dify-api:0.6.15

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

image

Local docker run, set environment variables .env

ETL_TYPE=Unstructured
UNSTRUCTURED_API_URL=https://api.unstructuredapp.io/general/v0/general
UNSTRUCTURED_API_KEY=*****

The above error occurs when creating the knowledge base and starting the setup.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

Hello @biofer! 👋 I'm here to help you with any bugs, questions, or contributions you may have regarding the repository. Let me know how I can assist you!

To resolve the error indicating that the NLTK 'punkt' resource is not found when setting ETL_TYPE to Unstructured in your local Docker instance, you need to download the 'punkt' resource. You can do this by adding the following code to your Dockerfile or by running it in your Docker container:

import nltk
nltk.download('punkt')

If you are using a Dockerfile, you can add the following lines to ensure the 'punkt' resource is downloaded during the build process:

RUN python -m nltk.downloader punkt

This will ensure that the 'punkt' tokenizer is available when you run your application in the Docker container [1].

To continue talking to Dosu, mention @dosu.

biofer commented 1 month ago

I followed the official tutorial to start docker, if there is something wrong with the docker image, why don't you provide the latest docker image?

dosubot[bot] commented 1 month ago

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

biofer commented 1 month ago

Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.

import nltk
nltk.download('punkt')
biofer commented 1 month ago

I would think that since the official docs give instructions, shouldn't they be followed and shouldn't it just work silky smooth? Or is the experience of the domestic user unimportant and untested?

crazywoola commented 1 month ago

He is currently on vacation. Here is a similar issue raised long time ago. https://github.com/langgenius/dify/issues/4659

biofer commented 1 month ago

Any plans or details to divulge, my friend? @JohnJyong After the local image goes in, following this fails, so hopefully I can get the official integrated image.

import nltk
nltk.download('punkt')

So hopefully I can get a mirror image with no problems.