spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/index.html
Apache License 2.0
2.87k stars 718 forks source link

Feature Request: Add support for Groq #621

Closed thesurlydev closed 2 months ago

thesurlydev commented 4 months ago

Expected Behavior

Spring AI should support Groq with nothing but configuration changes.

Current Behavior

Although Groq's documentation states that it's compatible with OpenAI API, just changing the Spring AI configuration is not enough to successfully call Groq.

Context

I'd like to be able to call Groq's API using Spring AI. I tried with the following configuration:

spring.ai.openai.api-key=${GROQ_API_KEY}
spring.ai.openai.chat.options.model=llama3-70b
spring.ai.openai.chat.base-url=https://api.groq.com/openai
spring.ai.openai.chat.options.n=1

I get the following exception when I attempt to make a chat completion call:

org.springframework.web.client.RestClientException: Error while extracting response for type [org.springframework.ai.openai.api.OpenAiApi$ChatCompletion] and content type [application/json]
    at org.springframework.web.client.DefaultRestClient.readWithMessageConverters(DefaultRestClient.java:236) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.web.client.DefaultRestClient$DefaultResponseSpec.readBody(DefaultRestClient.java:667) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.web.client.DefaultRestClient$DefaultResponseSpec.toEntityInternal(DefaultRestClient.java:637) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.web.client.DefaultRestClient$DefaultResponseSpec.toEntity(DefaultRestClient.java:626) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.ai.openai.api.OpenAiApi.chatCompletionEntity(OpenAiApi.java:751) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.openai.OpenAiChatClient.doChatCompletion(OpenAiChatClient.java:368) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.openai.OpenAiChatClient.doChatCompletion(OpenAiChatClient.java:75) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.model.function.AbstractFunctionCallSupport.callWithFunctionSupport(AbstractFunctionCallSupport.java:124) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.ai.openai.OpenAiChatClient.lambda$call$1(OpenAiChatClient.java:143) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
    at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:335) ~[spring-retry-2.0.5.jar:na]
    at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:211) ~[spring-retry-2.0.5.jar:na]
    at org.springframework.ai.openai.OpenAiChatClient.call(OpenAiChatClient.java:141) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
...
Caused by: java.net.HttpRetryException: cannot retry due to server authentication, in streaming mode
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1796) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1599) ~[na:na]
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:531) ~[na:na]
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:307) ~[na:na]
    at org.springframework.http.client.SimpleClientHttpResponse.getStatusCode(SimpleClientHttpResponse.java:55) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.http.client.observation.DefaultClientRequestObservationConvention.outcome(DefaultClientRequestObservationConvention.java:155) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.http.client.observation.DefaultClientRequestObservationConvention.getLowCardinalityKeyValues(DefaultClientRequestObservationConvention.java:98) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.http.client.observation.DefaultClientRequestObservationConvention.getLowCardinalityKeyValues(DefaultClientRequestObservationConvention.java:41) ~[spring-web-6.1.6.jar:6.1.6]
    at io.micrometer.observation.SimpleObservation.stop(SimpleObservation.java:174) ~[micrometer-observation-1.12.5.jar:1.12.5]
    at org.springframework.web.client.DefaultRestClient$DefaultRequestBodyUriSpec.exchangeInternal(DefaultRestClient.java:499) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.web.client.DefaultRestClient$DefaultRequestBodyUriSpec.retrieve(DefaultRestClient.java:444) ~[spring-web-6.1.6.jar:6.1.6]
    at org.springframework.ai.openai.api.OpenAiApi.chatCompletionEntity(OpenAiApi.java:750) ~[spring-ai-openai-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
thesurlydev commented 4 months ago

Taking a quick look at a response from Groq using curl it appears the response object is a little different from the ChatCompletion response. Here's an example response from Groq:

{
  "id": "chatcmpl-ee2adcbd-3697-47e9-88b9-d4a5c4ba2032",
  "object": "chat.completion",
  "created": 1713808705,
  "model": "mixtral-8x7b-32768",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Fast language models are important for a variety of reasons, including:\n\n1. Real-time applications: Fast language models can process and generate text in real-time, making them well-suited for applications such as chatbots, virtual assistants, and real-time translation.\n2. Large-scale processing: Fast language models can handle large volumes of text data quickly and efficiently, making them useful for tasks such as indexing and searching large corpora of text.\n3. Low-resource environments: Fast language models can run on devices with limited computational resources, such as smartphones and embedded devices, making natural language processing (NLP) capabilities accessible to a wider range of users and applications.\n4. Interactive exploration: Fast language models can be used to interactively explore and manipulate text data, allowing users to quickly and easily experiment with different text generation prompts and settings.\n5. Cost-effective: Fast language models can be less computationally intensive, which can result in lower costs for training and deployment, as well as reduced energy consumption.\n\nOverall, fast language models are important for enabling a wide range of NLP applications, from real-time chatbots to large-scale text processing, that can run efficiently and effectively in a variety of environments."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "prompt_time": 0.006,
    "completion_tokens": 267,
    "completion_time": 0.472,
    "total_tokens": 285,
    "total_time": 0.478
  },
  "system_fingerprint": "fp_7b44c65f25",
  "x_groq": {
    "id": "req_01hw3fb11dft4t8ny3htzw8ga3"
  }
}
Mikl38400 commented 3 months ago

Did you find a way to make it work ?

thesurlydev commented 3 months ago

No. Although, I didn't look to see how easy it would be to override the response. Otherwise, it may be necessary to explicitly add support for Groq.

tzolov commented 2 months ago

resolved by a6bed95358ab2cd5a3b3ec9a1a614a1b7ae610aa