For a chat bot that needs to go live in the next 24 hours we often get the following Error the middle of the conversation:
{“type”:“error”,“error”:{“details”:null,“type”:“overloaded_error”,“message”:“Overloaded”}}.
Using “@anthropic-ai/sdk”: “^0.27.3"
Our usual input token length per minute can be around 5k-30k per minute. Implemented retry solution but still get it, also sometimes long delays. API calls from us can be 2-3 per conversation and expecting at peak to have 3-4 conversations simultaneously.
Will really appreciate urgent feedback. Also, can this be resolved by implementing prompt cache? Or any other techniques?
Description
For a chat bot that needs to go live in the next 24 hours we often get the following Error the middle of the conversation: {“type”:“error”,“error”:{“details”:null,“type”:“overloaded_error”,“message”:“Overloaded”}}.
Using “@anthropic-ai/sdk”: “^0.27.3"
Our usual input token length per minute can be around 5k-30k per minute. Implemented retry solution but still get it, also sometimes long delays. API calls from us can be 2-3 per conversation and expecting at peak to have 3-4 conversations simultaneously. Will really appreciate urgent feedback. Also, can this be resolved by implementing prompt cache? Or any other techniques?
Code example
No response
AI provider
No response
Additional context
No response