Add retry options for VertexAiGeminiChatModel

rafal-dudek commented 1 month ago

Expected Behavior

VertexAiGeminiChatModel should use retry options similar to e.g. OpenAiChatModel.

Current Behavior

VertexAiGeminiChatModel does not use retry.

Context

Gemini model 1.5 Pro sometimes returns error:

java.lang.RuntimeException: Failed to generate content
    at org.springframework.ai.vertexai.gemini.VertexAiGeminiChatModel.getContentResponse(VertexAiGeminiChatModel.java:532)
    at org.springframework.ai.vertexai.gemini.VertexAiGeminiChatModel.call(VertexAiGeminiChatModel.java:173)
...
Caused by: com.google.api.gax.rpc.ResourceExhaustedException: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Unable to submit request because the service is temporarily out of capacity. Try again later.

Retrying in such cases is crucial for stable application operation. More information of resources exhaustion: https://cloud.google.com/vertex-ai/generative-ai/docs/quotas#troubleshoot-dynamic-shared-quota

There is already an issue to add spring-ai-retry dependency https://github.com/spring-projects/spring-ai/issues/832, but just adding dependency does not solve the problem with not using retries by VertexAiGeminiChatModel.

ddobrin commented 3 weeks ago

Hi @rafal-dudek just to clarify, are you looking for a retry solution with support for:

exponential backoff, configurable
support for both pay-as-you-go and provisioned throughput modes, configured

rafal-dudek commented 3 weeks ago

@ddobrin

We would just like to have feature similar to different models in Spring-AI e.g. OpenAI Chat with retry properties: https://docs.spring.io/spring-ai/reference/api/chat/openai-chat.html#_retry_properties

Currently we implemented retries on top of the VertexAiGeminiChatModel invocation, but it would be nice to have it implemented in the library.

For now, we do not use provisioned throughput mode, so it is not needed by us, but of course it is nice feature that could be available.

rafal-dudek commented 3 weeks ago

I see merged PR: https://github.com/spring-projects/spring-ai/pull/1437. So, looks like it should be working now with v1.0.0-M3, but it is not described in Gemini the documentation: https://docs.spring.io/spring-ai/reference/api/chat/vertexai-gemini-chat.html

spring-projects / spring-ai

Add retry options for VertexAiGeminiChatModel #1409