Open tzolov opened 1 year ago
We should look into an AOP implementation option and also dive into when retry should be attempted based on the http code or other information as not all exceptions are transient in nature.
Following up on what @markpollack said, if the exception indicates that the prompt exceeded the token limit, a retry should not be made. No amount of retries will ever resolve a token limit problem. But as it is right now (0.8.0-SNAPSHOT), a token limit exception seems to retry indefinitely, causing the app to hang.
@habuma, could you please share the token limit exception
stack/context?
Is the http status code 429?
It might be feasible to use a user-provided ClientHttpRequestInterceptor
for retry purposes for clients using RestClient
in infrastructure. (e.g. https://github.com/making/retryable-client-http-request-interceptor) However, the current autoconfiguration (such as for OpenAIApi or OllamaApi) uses a hardcoded RestClientBuilder. Shouldn't we consider using a pre-configured RestClientBuilder in Spring Boot to allow users to customize retries and other aspects through RestClientCustomizer
?
If not mistaken the RetryTemplate is used currently only for the OpenAI clients implementations.
It was sort of temporal patch, as without it the ITs where often not able to pass.
We will look into this after in the scope of the 0.9.0.
FYI, as an intermediate relief we added a retry auto-configuration with some properties: 1e3eaec7b9d853e399cb370dfd63e05ccee193ca See the available retry properties here: https://docs.spring.io/spring-ai/reference/0.8-SNAPSHOT/api/clients/openai-chat.html#_retry_properties
Property | Description | Default |
---|---|---|
spring.ai.retry.max-attempts | Maximum number of retry attempts. | 10 |
spring.ai.retry.backoff.initial-interval | Initial sleep duration for the exponential backoff policy. | 2 sec. |
spring.ai.retry.backoff.multiplier | Backoff interval multiplier. | 5 |
spring.ai.retry.backoff.max-interval | Maximum backoff duration. | 3 min. |
spring.ai.retry.on-client-errors | If false, throw a NonTransientAiException, and do not attempt retry for 4xx client error codes | false |
spring.ai.retry.exclude-on-http-codes | List of HTTP status codes that should not trigger a retry (e.g. to throw NonTransientAiException). | empty |
We probably should separate out this issue into the major components, vector, embedding, chat and other clients. I think we have done a good job on many of the client classes, less so for vectordbs. Perhaps we can add this over time post 1.0 as the need arrises.
The AiClient, EmbeddingClient or VectorStore clients interaction with their remote service endpoints could suffer from transient errors such as a momentary network glitch or rate limitation errors. Often, those communication issues are resolvable by repetitive service invocation or altering the invocation rate.
We should provide a retry-decorators that automatically re-invoke the failed operations according to pre-configured retry policies.