langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.94k stars 7.53k forks source link

[WinError 10053] An established connection was aborted by the software in your host machine #7716

Closed Yaghnesh closed 1 month ago

Yaghnesh commented 2 months ago

Self Checks

Dify version

0.7.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

When trying to upload a large document into knowledge base it is throwing an error that [WinError 10053] An established connection was aborted by the software in your host machine.

A bit Larger document is divided into chunks and trying to upload the embeddings it into qdrant vector DB . It is processing only a limited amount of chunks and after that it is throwing this error. Here in this case the whole document is divided into around 383 chunks it is working fine for upto 350 chunks after that it is halting the process and throwing this error Screenshot 2024-08-27 171043 Screenshot 2024-08-27 171120

✔️ Expected Behavior

To upload all chunks into qdrant DB without any issue.

❌ Actual Behavior

Halting before complete upload of the document.

dosubot[bot] commented 2 months ago

Hey @Yaghnesh! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's see how we can tackle this issue together.

The error [WinError 10053] An established connection was aborted by the software in your host machine typically indicates a network issue or a timeout problem. Here are some steps and configurations you can check to resolve this issue:

  1. Timeout Settings: Ensure that the timeout settings for the Qdrant client are appropriately configured. The default timeout is set to 20 seconds, but you might need to increase it for large uploads:

    class QdrantConfig(BaseModel):
        endpoint: str
        api_key: Optional[str] = None
        timeout: float = 20  # Increase this value
        root_path: Optional[str] = None
        grpc_port: int = 6334
        prefer_grpc: bool = False
  2. Gunicorn Timeout: Adjust the GUNICORN_TIMEOUT setting in your environment configuration. The default is 200 seconds, but it is recommended to set it to 360 seconds to support longer request handling times:

    GUNICORN_TIMEOUT=360
  3. Batch Processing: Ensure that you are using batch processing to upload data in chunks. This can help manage the load and prevent timeouts. The _generate_rest_batches method processes texts, embeddings, and metadata in batches of a specified size (default is 64).

  4. Error Handling and Retries: Implement robust error handling and retry logic for operations like delete, delete_by_ids, and search_by_vector to manage unexpected responses and ensure the system remains resilient.

  5. Network Stability: Check the stability of your network connection. Unstable connections can cause interruptions during large uploads.

By adjusting these settings and ensuring robust error handling, you should be able to mitigate the issue and successfully upload all chunks into the Qdrant DB [1][2].

To continue talking to Dosu, mention @dosu.

Yaghnesh commented 2 months ago

I have tried changing the timeout settings but it is again throwing the same error. Is there any other way to resolve this issue?

dosubot[bot] commented 2 months ago

If adjusting the timeout settings didn't resolve the issue, here are a few additional steps you can take to address the [WinError 10053] error:

  1. Increase Nginx Client Max Body Size: Ensure that the NGINX_CLIENT_MAX_BODY_SIZE setting is appropriately configured to handle large uploads. The default is 15M, but you might need to increase it:

    NGINX_CLIENT_MAX_BODY_SIZE: 50M
  2. Check for Network Stability: Ensure that your network connection is stable. Unstable connections can cause interruptions during large uploads. You might want to test the network stability or try uploading from a different network.

  3. Retry Mechanism: Implement a retry mechanism for failed uploads. Although the Dify codebase does not have built-in retry mechanisms for failed uploads, you can add custom retry logic in your upload process to handle intermittent failures.

  4. Chunk Size Adjustment: Reduce the size of each chunk being uploaded. Smaller chunks may reduce the likelihood of timeouts and connection issues. You can adjust the chunk size in your document processing logic.

  5. Logging and Debugging: Enable detailed logging to diagnose the issue further. Check the logs for any specific error messages or patterns that might indicate the root cause of the problem.

  6. Gunicorn Worker Timeout: Increase the GUNICORN_TIMEOUT setting further if necessary. Sometimes, even 360 seconds might not be enough for very large uploads:

    GUNICORN_TIMEOUT=600
  7. Qdrant Configuration: Ensure that the Qdrant server is properly configured to handle large volumes of data. Check the Qdrant server logs for any errors or warnings that might indicate issues with handling large uploads.

By combining these steps, you should be able to mitigate the issue and successfully upload all chunks into the Qdrant DB. If the problem persists, you might need to investigate further into the specific network and server configurations.

To continue talking to Dosu, mention @dosu.

Yaghnesh commented 2 months ago

I was not using docker set up. I was using the local set up of dify. Now, help me solve this error [WinError 10053] An established connection was aborted by the software in your host machine

dosubot[bot] commented 2 months ago

Since you are using a local setup of Dify and not Docker, here are some additional steps to help resolve the [WinError 10053] error:

  1. Increase Timeout Settings in Local Setup: Ensure that the timeout settings for the Qdrant client are increased in your local setup. You can modify the QdrantConfig class to increase the timeout value:

    class QdrantConfig(BaseModel):
       endpoint: str
       api_key: Optional[str] = None
       timeout: float = 60  # Increase this value
       root_path: Optional[str] = None
       grpc_port: int = 6334
       prefer_grpc: bool = False
  2. Adjust Gunicorn Timeout: If you are using Gunicorn to serve your application, increase the timeout setting. You can do this by setting the timeout parameter in your Gunicorn command:

    gunicorn --timeout 600 your_app:app
  3. Reduce Chunk Size: Try reducing the chunk size for your uploads. Smaller chunks may help avoid timeouts and connection issues. You can adjust the chunk size in your document processing logic:

    def _generate_rest_batches(
           self,
           texts: Iterable[str],
           embeddings: list[list[float]],
           metadatas: Optional[list[dict]] = None,
           ids: Optional[Sequence[str]] = None,
           batch_size: int = 32,  # Reduce this value
           group_id: Optional[str] = None,
    ) -> Generator[tuple[list[str], list[rest.PointStruct]], None, None]:
  4. Implement Retry Logic: Add custom retry logic to handle intermittent failures. You can use a library like tenacity to implement retries:

    from tenacity import retry, stop_after_attempt, wait_fixed
    
    @retry(stop=stop_after_attempt(5), wait=wait_fixed(2))
    def upload_chunk(chunk):
       # Your upload logic here
       pass
  5. Check Network Stability: Ensure that your network connection is stable. Unstable connections can cause interruptions during large uploads. You might want to test the network stability or try uploading from a different network.

  6. Enable Detailed Logging: Enable detailed logging to diagnose the issue further. Set the LOG_LEVEL to DEBUG to get more detailed logs:

    export LOG_LEVEL=DEBUG
  7. Check Qdrant Server Configuration: Ensure that the Qdrant server is properly configured to handle large volumes of data. Check the Qdrant server logs for any errors or warnings that might indicate issues with handling large uploads.

By combining these steps, you should be able to mitigate the issue and successfully upload all chunks into the Qdrant DB. If the problem persists, further investigation into the specific network and server configurations may be necessary.

To continue talking to Dosu, mention @dosu.

Yaghnesh commented 2 months ago

@dosu I tried all the above mentioned steps but i was facing the same issue again and again

dosubot[bot] commented 2 months ago

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

Yaghnesh commented 2 months ago

Hi, I was still facing this issue even after updating timeout variable and reducing chunk size. can you help me in this?

crazywoola commented 1 month ago

https://stackoverflow.com/questions/51562067/connectionabortederror-winerror-10053-an-established-connection-was-aborted-b