langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
43.06k stars 5.97k forks source link

text embedding error due to Requests rate limit exceeded #3936

Closed YanxingLiu closed 3 months ago

YanxingLiu commented 4 months ago

Self Checks

Dify version

0.3.21

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

I have added tongyi text embedding v1 & v2 models and I am going to add a knowledge base. I am using the default setting, as the following figure shows.

image

✔️ Expected Behavior

The document should be embedded into vector bases.

❌ Actual Behavior

But it reports error as the figure shows,

image

To figure out this error, I run dify locally. And I find the logs report that this is due to the reuqest error of tongyi embedding. The figure shows that, the first embedding result is right, but then it reaches the Requests rate limit. The none result is then processed and an error is thrown. So, if there are some methods to process text embedding with in the request rate limit.

dosubot[bot] commented 4 months ago

To address the issue of exceeding the Requests rate limit and the resulting error ("'NoneType' object has no attribute 'items'"), you'll need to implement a rate-limiting mechanism and enhance error handling in your self-hosted Dify application. Here's a concise guide on how to proceed:

  1. Implement Rate Limiting: Introduce a rate limiter in your code to manage the number of requests sent to the Tongyi text embedding service. This will help avoid hitting the rate limit.

  2. Enhance Error Handling: Modify the embed_documents method to better handle exceptions, especially those related to exceeding the rate limit. This involves catching exceptions and implementing logic to deal with them, such as retrying after a delay.

Here's a simplified code snippet to illustrate these concepts:

import time
# Other imports...

class TongyiTextEmbeddingModel:
    # Existing methods...

    @staticmethod
    def embed_documents(credentials_kwargs: dict, model: str, texts: list[str]) -> tuple[list[list[float]], int]:
        embeddings = []
        embedding_used_tokens = 0
        for text in texts:
            # Rate limiting logic here

            try:
                response = # Tongyi embedding API call
                if response and "embeddings" in response:
                    embeddings.append(response["embeddings"][0]["embedding"])
                    embedding_used_tokens += response["usage"]["total_tokens"]
                else:
                    # Handle unexpected response or rate limit exceedance
            except Exception as e:
                # Specific handling for rate limit exceedance
                # e.g., log error, retry after waiting

        return embeddings, embedding_used_tokens

This example includes placeholders for the rate limiting logic and specific error handling. You'll need to tailor these based on the Tongyi service's API specifics and your chosen rate limiting strategy.

For detailed information on the Tongyi text embedding service's rate limit, refer to their official documentation or support. The Dify repository and its documentation may also offer insights into configuring third-party services and handling rate limits.

By implementing these strategies, you can prevent the error and ensure your application handles rate limits gracefully.

#### Details **Sources** - [api/core/model_runtime/model_providers/tongyi/text_embedding/text_embedding.py](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/tongyi/text_embedding/text_embedding.py) - [api/app.py](https://github.com/langgenius/dify/blob/main/api/app.py) - [api/core/app/apps/base_app_generate_response_converter.py](https://github.com/langgenius/dify/blob/main/api/core/app/apps/base_app_generate_response_converter.py) - [api/core/model_runtime/model_providers/tongyi/tongyi.yaml](https://github.com/langgenius/dify/blob/main/api/core/model_runtime/model_providers/tongyi/tongyi.yaml) - [en/tutorials/model-configuration/README.md](https://github.com/langgenius/dify-docs/blob/main/en/tutorials/model-configuration/README.md)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

CaptainDP commented 3 months ago

I also encountered the same problem. Can I set a speed limit or concurrency control on the page to avoid triggering over-limit logic?

crazywoola commented 3 months ago

Nope, we have no plan for this feature at this moment.