spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.13k stars 780 forks source link

Newly created ChromaDB collection is missing collectionId and failing on insert #1240

Open novakma2 opened 2 months ago

novakma2 commented 2 months ago

Bug description When trying to add documents to ChromaDB, client returns 404 Not Found: "{"detail":"Not Found"}" with my multistore setup on newly added collection.

Environment Java 21; Springboot 3.3.2; SpringAI 1.0.0 M1; ChromaDB latest;

Steps to reproduce Create new Springboot project with ChromaDB, create a collection using chromaAPI, and try to add a document to that collection.

Expected behavior When new ChromaDB collection is created, id should be populated and all actions should be working.

Minimal Complete Reproducible example In my current multi collection setup, I am creating new collection per customer and adding the client to map, as that seems to be the only way to add data to different collections.

public void synchronizeCollections() {
        customerService.getCustomers().stream()
                .filter(customer -> !storeByHost.containsKey(customer.host()))
                .forEach(customer -> {
                    final var collection = chromaApi.createCollection(new ChromaApi.CreateCollectionRequest(customer.host()));
                    logger.info("Inserting new ChromaDB collection {} with name {}", collection.id(), collection.name());
                    final var store = new ChromaVectorStore(embeddingModel, chromaApi, collection.name(), true);
                    storeByHost.put(collection.name(), store);
                });
    }

Then when running the add for the collection of given host

    public void insertToStore(String host, List<Document> documents) {
        if (!storeByHost.containsKey(host)){
            logger.error("Host is not present in ChromaDB collection, host:{}", host);
            return;
        }
        storeByHost.get(host).add(documents);
    }

I am receiving following exception:

org.springframework.web.client.HttpClientErrorException$NotFound: 404 Not Found: "{"detail":"Not Found"}"
    at org.springframework.web.client.HttpClientErrorException.create(HttpClientErrorException.java:112) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:183) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:137) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.ResponseErrorHandler.handleError(ResponseErrorHandler.java:63) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:942) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:891) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:790) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:672) ~[spring-web-6.1.11.jar:6.1.11]
    at org.springframework.ai.chroma.ChromaApi.upsertEmbeddings(ChromaApi.java:321) ~[spring-ai-chroma-store-1.0.0-M1.jar:1.0.0-M1]
    at org.springframework.ai.vectorstore.ChromaVectorStore.add(ChromaVectorStore.java:104) ~[spring-ai-chroma-store-1.0.0-M1.jar:1.0.0-M1]
    at my.site.chromadb.ChromaDbService.insertToStore(ChromaDbService.java:64) ~[classes/:na]
    at my.site.sync.SynchronizationService.sync(SynchronizationService.java:70) ~[classes/:na]
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]

After doing some digging, it seems that the id of created collection is not being populated, and it throws exception on upsert embeddings where it requires the id, ..

Screen 2024-08-18 v 23 03 04

The issue is then on ChromaApi.java on line 321 where its calling the endpoint with the null collection id

    //
    // Chroma Collection API (https://docs.trychroma.com/js_reference/Collection)
    //

    public void upsertEmbeddings(String collectionId, AddEmbeddingsRequest embedding) {

        this.restTemplate
            .exchange(this.baseUrl + "/api/v1/collections/{collection_id}/upsert", HttpMethod.POST,
                    this.getHttpEntityFor(embedding), Boolean.class, collectionId)
            .getBody();
    }

edit, manually calling .afterPropertiesSet(); method on each store solves the problem by the looks of it

mohamedYoussfi commented 2 weeks ago

Right. Is there any update of this issue?