spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.32k stars 850 forks source link

Cohere embedding model options not truncating as expected #1753

Closed bruno-oliveira closed 1 day ago

bruno-oliveira commented 4 days ago

Bug description Currently, we are using the Bedrock Cohere embedding model via Spring AI.

It is stated, in the docs that the default option for "truncation" is NONE.

What this implies in practice is that when we have a given chunk we want to embed that is longer than the allowed 2048 characters by the underlying Cohere API, that will result in an error.

An option to circumvent that is to configure the embeddingModel client in such a way that we made a certain truncation strategy the default one. In our case, we need to be "behind a VPC" so, we simply used a "custom" client that exposes a URL to configure that, but,left everything else as-is.

@Bean(name = "cohereEmbeddingModel")
    @ConditionalOnProperty("spring.ai.bedrock.cohere.embedding.enabled")
    public EmbeddingModel cohereEmbeddings() {
        log.info("Configured Cohere embedding model with VPC connection");
        return new CustomBedrockCohereEmbeddingModel(
                new CustomCohereEmbeddingBedrockApi(
                        embeddingModel,
                        DefaultCredentialsProvider.create(),
                        awsRegion,
                        new ObjectMapper(),
                        bedrockRuntimeClient(),
                        bedrockRuntimeAsyncClient()),
                CustomBedrockCohereEmbeddingOptions.builder()
                        .withInputType(SEARCH_DOCUMENT)
                        .withTruncate(END)
                        .build());
    }

You can see that the options we pass to the model include a default truncation strategy to remove the end of a chunk if its longer than the 2048 characters limit.

Now, what should happen I believe is: when we call the actual model with a chunk, such as:

embeddingModel.embed("SomeString".repeat(2048))

That the call would work while the actual string being embedded would be truncated to have exactly 2048 characters in length while removing the characters from the END, as specified in the client configuration above.

However, what happens is that this results in an exception:

EmbeddingGenerationService     : There was an error invoking embedding model: Malformed input request: #/texts/0: expected maxLength: 2048, actual: 2363, please reformat your input and try again. (Service: BedrockRuntime, Status Code: 400)

What we ended up doing was manually truncating our input chunks if they are too large as well as playing with the chunking parameters a little bit although the expectancy would be that the inputs would follow the truncation strategy as defined in the client above without the need to truncate it manually ourselves?

Environment Java 21, Springboot 3.3.0, Spring AI 1.0.0-M3

Steps to reproduce Sending an embedding request with an input string longer than 2048 characters triggers the error as truncation doesn't happen.

Expected behavior The chunk truncation should happen under the hood.

Minimal Complete Reproducible example Configuring the client as above, then sending a request to embed a string with length > 2048 is enough to trigger the error.

markpollack commented 1 day ago

Just to note, here is a related issue which highlights the issue with Bedrock Cohere itself.

https://github.com/deepset-ai/haystack-core-integrations/issues/912