Open ThomasVitale opened 8 months ago
Hello, that's more or less the same strategy I tought to use for a generic approach to the timeout problem. I think it's a crucial aspect to take care of moving towards a 1.0 release since we are talking about common requirements for non streaming consumers. Also when a read timeout occurs you lost the response forever and for larger commercial models it means money.
I was available to contribute too but till now I had little luck getting feedback from project owners.
There is a lot to unpack here, so let's start small and work our way to more features.
At the lowest level, we are using either our own hand written client to talk with a model, OpenAiApi is a perfect example. If a user is operating at this level, there are a few things that can be done.
RestClientCustomizer
as shown in this example.For other models, for example AzureOpenAI or Google vertex, we are using client libraries provided by Microsoft and Google and we can't use the approach above.
We can however at a high level, the ChatClient
level, I first thought we could introduce a logging advisor to the code base but the advisor doesn't yet have access to the final prompt, only the parts that go into making it. So instead we should update ChatModel
implementations to do the logging at the appropriate places in that class. This issue discusses that.
Potentially we can still have a logging advisor, but it would serve a different purpose, and is likely still a useful addition.
On another topic, of retry, this could potentially move out of the *Api
classes and be moved into an advisor. The issue there is that retry would only kick in if using ChatClient
and not the *Api
classes. I suspect that the right strategy is to put retry in when we can at the lowest level and also provide a retry advisor to be used when we don't control the underlying library that communicates with the AI model.
Thoughts?
I like the idea of creating advisors for logging purposes :+1:
However, when thinking about retry logic...
Currently, we handle two ways of calling models:
*Api
- RestClient
or WebClient
OpenAIClient
or Google GenerativeModel
I imagine the retry logic should be the same across all models. Tying it to the *Api
classes doesn't allow us to reuse it in the SDK scenarios. Additionally, we should consider models that don't use a ChatClient
, such as transcription or speech models.
Therefore, I suggest introducing a new retry layer — or even more broadly, a resilience layer (starting with retry support but with the potential to add new features in the future):
There could also be several other layers for customizing the HTTP client and so on, as @ThomasVitale mentioned.
@markpollack @piotrooo thank you both for sharing your thoughts!
I see two types of logs that can be useful in an application using Spring AI. My original intent with this issue was to cover the first type.
HTTP Requests/Responses. Logging of headers and/or body of the HTTP interactions with an LLM provider. For example, this is useful when troubleshooting what's the underlying format of a request/response and spot JSON conversion errors or incompatibilities with updated APIs from the provider.
*Api
classes provided by Spring AI, I think there should be a way to customise the underlying RestClient
or WebClient
with a logging interceptor (and similarly also timeouts and SslBundles
). The workaround shown here is good enough for experiments, but it cannot really be used in real-world application because the RestClientCustomizer
/WebClientCustomizer
would be shared across the application.Prompt/Completion. Logging of the content of a prompt or a completion. For example, this is very important when it comes to prompt design/evaluation or observability. I would not recommend implementing such functionality via explicit log messages in the ChatClient
API (or the underlying ChatModel
). Instead, I would recommend framing this feature in the broader context of introducing observability for Spring AI. Using the Micrometer Observation
API, it's possible to instrument the ChatModel
classes once and configure logs, metrics, and traces through the Micrometer machinery. It's critical to include prompt/completion content in the observability solution because it's necessary for any evaluation/prompt design integration. I have a draft solution I'll share soon, I need to polish a few things. I wouldn't introduce a LoggingAdvisor
at the moment. I think we need first the observability foundation at the ChatModel
level before addressing further observability needs at the ChatClient
level (using Advisors
to offer observability for these higher-level workflows/chains, which typically consist of multiple LLM requests and function calls).
What do you think?
That's not something Spring AI can solve (unless perhaps surfacing some auto configuration properties, should that capability exist in those libraries).
I thought about some customizers for SDK clients, but I'm not really convinced by this approach. However, I think this is probably how I want to customize e.g., Azure OpenAIClient
(and others).
I think we need first the observability foundation at the ChatModel level before addressing further observability needs at the ChatClient level (using Advisors to offer observability for these higher-level workflows/chains, which typically consist of multiple LLM requests and function calls).
Right now, ChatClient
is going to be a Swiss army knife with observability, retries, ahh and, of course sending requests to the model :grimacing:.
But for now, I don't have a better idea.
Enhancement Description
Each model integration is composed of two aspects: an
*Api
class calling the model provider over HTTP, and a*Client
class encapsulating the LLM specific aspects.Each
*Client
class is highly customizable based on nice interfaces, making it possible to overwrite many different options. It would be nice to provide similar flexibility for each*Api
class as well. In particular, it would be useful to be able to configure options related to the HTTP Client.Examples of aspects that would need to be configured:
SslBundle
to connect with on-prem model providers using custom CA certificates;Furthermore, there might be additional needs for configuring resilience patterns:
More settings that right now are part of the model connection configuration (and that still relates to the HTTP interaction) would also need to be customisable in enterprise use cases in production (e.g. multi-user applications or even multi-tenant applications). For example, when using OpenAI, the following could need changing per request/session.
All the above is focused on the HTTP interactions with model providers, but the same would be useful for vector stores.
Possible Solutions
Drawing from the nice abstractions designed to customize the model integrations and ultimately implementing the
ModelOptions
interface, it could be an idea to define a dedicated abstraction to pass HTTP client customizations to an*Api
class (something likeHttpClientConfig
), which might also be exposed via configuration properties (underspring.ai.<model>.client.*
).For the more specific resilience configurations (like retries and fallbacks), an annotation-driven approach might be more suitable. Resilience4j might provide a way to achieve this, since I don't think Spring supports the Fault Tolerance Microprofile spec.
A partial alternative solution would be for developers to define a custom
RestClient.Builder
orWebClient.Builder
and pass that to each*Api
class, but it would result in a lot of extra configurations and reduce the convenience of the autoconfiguration. Also, it would tight a generic configuration like "enable logs" or "use a custom CA" to the specific client used, resulting in duplication when both blocking and streaming interactions are used in the same application.I'm available to contribute and help solve this issue.
Related Issues