Closed XiaoLey closed 1 month ago
To resolve the JSONDecodeError
during a tokenize
request to the Cohere API when using Dify version 0.6.12, self-hosted via Docker, you should ensure that the JSON payload you are sending is correctly formatted. Here is an example of how to properly format the JSON payload for a tokenize
request:
{
"name": "Dify",
"text": "Dify means Do it for you...",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": true
}
],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}
Ensure that the JSON structure is correct and that all necessary fields are included. If the JSON payload is malformed or missing required fields, it can lead to a JSONDecodeError
[1].
Additionally, there are known issues or limitations with the Cohere API that could cause a 500 Internal Server Error during a tokenize
request. Specifically, the InvokeServerUnavailableError
is mapped to cohere.errors.internal_server_error.InternalServerError
, which indicates that an internal server error from the Cohere API can result in a 500 Internal Server Error [2].
To get more detailed information about the tokenize
request, you can configure the logging level to DEBUG
:
import logging
# Set the logging level to DEBUG
logging.basicConfig(level=logging.DEBUG)
# Example usage
logger = logging.getLogger(__name__)
logger.debug("This is a debug message")
In the context of the provided code, you can add debug logging statements to the _tokenize
method to log detailed information about the tokenize
request:
def _tokenize(self, model: str, credentials: dict, text: str) -> list[str]:
"""
Tokenize text
:param model: model name
:param credentials: model credentials
:param text: text to tokenize
:return:
"""
if not text:
return []
# initialize client
client = cohere.Client(credentials.get('api_key'), base_url=credentials.get('base_url'))
logger.debug(f"Tokenizing text: {text} with model: {model}")
response = client.tokenize(
text=text,
model=model,
offline=False,
request_options=RequestOptions(max_retries=0)
)
logger.debug(f"Tokenize response: {response.token_strings}")
return response.token_strings
By setting the logging level to DEBUG
and adding debug logging statements, you can get more detailed information about the tokenize
request to the Cohere API [3][2].
To continue talking to Dosu, mention @dosu.
Looks like this issue is occurring within the Cohere client. Could you provide more detailed information? For example, what content are you processing when this error is triggered?
I initiated a high-quality text segmentation and cleaning process on the content extracted from https://swagger.io/specification/ using Firecrawl. After waiting for quite some time, an exception occurred. It seems that a large portion of the content was processed successfully as I can see some segmentation results, but the results are incomplete.
@laipz8200
Self Checks
Dify version
0.6.12
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When using Dify to perform "Text Segmentation and Cleaning" on a knowledge base, a
JSONDecodeError
occurs during atokenize
request to the Cohere API, preventing the proper decoding of the API response. The detailed error message is as follows:✔️ Expected Behavior
No response
❌ Actual Behavior
The Cohere API returns a 500 Internal Server Error