Add invocation retry capabilities for AiClient, EmbeddingClient and VectorStore

spring-projects / spring-ai

An Application Framework for AI Engineering

https://docs.spring.io/spring-ai/reference/index.html

Apache License 2.0

3.33k stars 855 forks source link

Add invocation retry capabilities for AiClient, EmbeddingClient and VectorStore #123

Open tzolov opened 1 year ago

tzolov commented 1 year ago

The AiClient, EmbeddingClient or VectorStore clients interaction with their remote service endpoints could suffer from transient errors such as a momentary network glitch or rate limitation errors. Often, those communication issues are resolvable by repetitive service invocation or altering the invocation rate.

We should provide a retry-decorators that automatically re-invoke the failed operations according to pre-configured retry policies.

markpollack commented 11 months ago

We should look into an AOP implementation option and also dive into when retry should be attempted based on the http code or other information as not all exceptions are transient in nature.

habuma commented 10 months ago

Following up on what @markpollack said, if the exception indicates that the prompt exceeded the token limit, a retry should not be made. No amount of retries will ever resolve a token limit problem. But as it is right now (0.8.0-SNAPSHOT), a token limit exception seems to retry indefinitely, causing the app to hang.

tzolov commented 10 months ago

@habuma, could you please share the token limit exception stack/context? Is the http status code 429?

tzolov commented 10 months ago

Assuming that the OpenAI rate-limit error is reported as a client error (e.g. in the 4xx range), than this patch should prevent retrying on client errors such as Invalid Authentication, Incorrect API key provided, Rate limit reached for requests...

making commented 10 months ago

It might be feasible to use a user-provided ClientHttpRequestInterceptor for retry purposes for clients using RestClient in infrastructure. (e.g. https://github.com/making/retryable-client-http-request-interceptor) However, the current autoconfiguration (such as for OpenAIApi or OllamaApi) uses a hardcoded RestClientBuilder. Shouldn't we consider using a pre-configured RestClientBuilder in Spring Boot to allow users to customize retries and other aspects through RestClientCustomizer?

tzolov commented 9 months ago

If not mistaken the RetryTemplate is used currently only for the OpenAI clients implementations.
It was sort of temporal patch, as without it the ITs where often not able to pass. We will look into this after in the scope of the 0.9.0.

tzolov commented 8 months ago

FYI, as an intermediate relief we added a retry auto-configuration with some properties: 1e3eaec7b9d853e399cb370dfd63e05ccee193ca See the available retry properties here: https://docs.spring.io/spring-ai/reference/0.8-SNAPSHOT/api/clients/openai-chat.html#_retry_properties

Property	Description	Default
spring.ai.retry.max-attempts	Maximum number of retry attempts.	10
spring.ai.retry.backoff.initial-interval	Initial sleep duration for the exponential backoff policy.	2 sec.
spring.ai.retry.backoff.multiplier	Backoff interval multiplier.	5
spring.ai.retry.backoff.max-interval	Maximum backoff duration.	3 min.
spring.ai.retry.on-client-errors	If false, throw a NonTransientAiException, and do not attempt retry for 4xx client error codes	false
spring.ai.retry.exclude-on-http-codes	List of HTTP status codes that should not trigger a retry (e.g. to throw NonTransientAiException).	empty

markpollack commented 4 months ago

We probably should separate out this issue into the major components, vector, embedding, chat and other clients. I think we have done a good job on many of the client classes, less so for vectordbs. Perhaps we can add this over time post 1.0 as the need arrises.