spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.21k stars 812 forks source link

向量库的 similaritySearch 不能同时带回 media 集合 #1257

Open yyqqing opened 2 months ago

yyqqing commented 2 months ago

Bug description 创建 Document 时 ,嵌入了 Media 对象,但通过 vectorStore.similaritySearch 查询后,返回的 document 中 Media Collection 为空

Environment Please provide as many details as possible: Spring AI version, Java version, which vector store you use if any, etc Springboot 3.3.2 SpringAI 1.0.0.SNAPSHOT Java 17 Chroma

Steps to reproduce

  1. 使用 Document doc = new Document(docId, docContent, mediaList, metadata); 新建 doc 并 调用 vectorStore.add(doc); 插入向量库
  2. 使用 List docs = vectorStore.similaritySearch(SearchRequest.defaults().withTopK(1)..withFilterExpression(filterExpr).withQuery(query)); 查询结果
  3. 确认 docs 中包含 与 doc 相同ID 的元素,但元素中的 media 为空
  4. 跟踪进入 org.springframework.ai.vectorstore.ChromaVectorStore 类的 similaritySearch 方法中,发现在处理查询结果时,没有使用带 media 的构造函数
  5. VectorStore 接口中没有提供其他方法能 获取 与 Document 管理的 Media 集合

Expected behavior 希望使用 vectorStore.similaritySearch 查询时,能同时带出所关联的 media 集合,否则这 media 集合就没有意义了

Minimal Complete Reproducible example 请见重现步骤部分

csterwa commented 1 month ago

English translation of title: SimilaritySearch of the vector library cannot bring back the media collection at the same time

bug description: When creating a Document, a Media object is embedded, but after querying through vectorStore.similaritySearch, the Media Collection in the returned document is empty

steps to reproduce:

  1. Use Document doc = new Document(docId, docContent, mediaList, metadata); to create a new doc and call vectorStore.add(doc); to insert the vector store
  2. Use List docs = vectorStore.similaritySearch(SearchRequest.defaults().withTopK(1)..withFilterExpression(filterExpr).withQuery(query)); to query the results
  3. Confirm that docs contains an element with the same ID as doc, but the media in the element is empty
  4. Tracking into the similaritySearch method of the org.springframework.ai.vectorstore.ChromaVectorStore class, it is found that when processing the query results, the constructor with media is not used
  5. The VectorStore interface does not provide other methods to obtain the Media collection managed by Document

expected behavior: It is hoped that when using vectorStore.similaritySearch to query, the associated media collection can be brought out at the same time, otherwise this media collection is meaningless

minimal complete reproducible example: See the reproducible steps section