An error occurs when the knowledge base processes text larger than 10M

sipeter commented 6 months ago

Self Checks

[X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.6.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I use xinference to access the embedding model. When processing text files exceeding 10M, an error message like: HTTPConnectionPool Max retries exceeded will appear during the process, so the error is not reported at the beginning, but during the process. When I delete the erroneous file and reload the process, the process will continue on the original basis, and an error may be reported, but as long as this action is repeated, the file will eventually be processed and the process will be displayed successfully. The same 10M text is also a model accessed through xinference. I can handle the task normally in fastgpt.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 6 months ago

To address the issue you're facing with processing text files larger than 10M in Dify 0.6.3 using xinference, consider increasing the UPLOAD_FILE_SIZE_LIMIT in your Docker environment configuration. This setting, by default, might not accommodate the overhead that comes with processing large files, hence the errors you're encountering.

Here's a quick guide to adjust this setting:

Find the environment configuration section in your Dify Docker setup.
Change the UPLOAD_FILE_SIZE_LIMIT to a higher value, such as 20M, to handle larger files.
Check and adjust other related settings, like storage and resource allocations, to ensure they align with the needs of processing larger files.
Restart your Dify Docker containers to apply these changes.

Make sure to also verify any intermediary network configurations, like reverse proxies, to support larger uploads and extended processing times. For detailed guidance, refer to the docker/docker-compose.yaml and environments.md documents in the Dify documentation.

#### Details

**Sources** - [docker/docker-compose.yaml](https://github.com/langgenius/dify/blob/main/docker/docker-compose.yaml) - [en/getting-started/install-self-hosted/environments.md](https://github.com/langgenius/dify-docs/blob/main/en/getting-started/install-self-hosted/environments.md)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

crazywoola commented 6 months ago

Not sure about the limitation of 10M, we have a default setting to 15M. But it will raise another thing 'you will not be able to submit'. Can you provide some screenshot or the docker container logs?

sipeter commented 6 months ago

Snipaste_2024-04-22_15-48-07 An error message similar to this is given. This time, the text used is 13.9M, 5 million characters, and an error is reported after processing about 20%.

sipeter commented 6 months ago

After upgrading to 0.6.4, the problem seems to be solved. I tested 2 texts larger than 10M, and no previous error message appeared. It can be processed in one go.

langgenius / dify