redis / redis-om-spring

Spring Data Redis extensions for better search, documents models, and more
MIT License
609 stars 94 forks source link

Add a Hash to Vector embedding sources to detect if source and embedding are out of sync #464

Open bsbodden opened 5 months ago

bsbodden commented 5 months ago

See https://github.com/redis/redis-om-spring/blob/main/redis-om-spring/src/main/java/com/redis/om/spring/annotations/Vectorize.java

Ideally a new annotation on the Hash field with connect it to the Vectorize annotation 'source' something like:

@Document
public class DocWithCustomModelOpenAIEmbedding {
  @Id
  private String id;

  @Indexed
  @NonNull
  private String name;

  @Indexed( //
            schemaFieldType = SchemaFieldType.VECTOR, //
            algorithm = VectorAlgorithm.HNSW, //
            type = VectorType.FLOAT32, //
            dimension = 3072, //
            distanceMetric = DistanceMetric.COSINE, //
            initialCapacity = 10
  )
  private float[] textEmbedding;

  @Vectorize( //
              destination = "textEmbedding", //
              embeddingType = EmbeddingType.SENTENCE, //
              provider = EmbeddingProvider.OPENAI, //
              openAiEmbeddingModel = EmbeddingModel.TEXT_EMBEDDING_3_LARGE, //
              trackChanges = true // <==== This will tell OM to hash the field into a private JSON or Hash field like `_text_hash` matching the name of the field below
  )
  @NonNull
  private String text;
}