Closed Btibert3 closed 7 months ago
Thanks for raising this. I agree that we should support some of the common functions. We can follow DuckDB's array functions: https://duckdb.org/docs/sql/data_types/array.html#functions. Arrays in DuckDB are equivalent to our fixed-length list type, so I don't think there is a hurdle in supporting these.
I'll put this into our pipeline.
Hi @Btibert3 that makes a lot of sense. Could you elaborate a bit on what the intended use case is in the context of RAG? To you, how would the ideal implementation look from a graph query perspective?
@semihsalihoglu-uw DuckDB is exactly what I had in my head.
@prrao87 Naive RAG lets you find entries (i.e. nodes) based on the similarity of the vector to the input query. We can go beyond this by further restricting the results by leveraging graph patterns. One example might be to show a list of products based on the user's input query (vector similarity) but further restrict/re-rank the results based on products the user hasn't purchased and behavior of other "similar" users, where similarity in this context is leveraging graph relationships. In this example, the results come from vector-based similarity and graph relationships.
Another example would be to consider the most similar document chunk via vector search, but improving context windows based on linked nodes and variable pattern matching, again using vector similarity but also the structure of the relationships in the graph.
@Btibert3 May i know which vector operations you are most interested in? So we can implement those in advance.
Sure thing.
In short I believe that you can go pretty far with those three.
@acquamarin I'd start with cosine and then extend to Euclidean (L2) and then finally dot product, in that order. Cosine seems to be the most common metric used for similarity search in general.
If those are the two being considered out of the gate, I completely agree with cosine.
Wow! Very impressed.
one week from request to implementation? Just unbelievable :-D Many thanks for your work!
I just came across this project, and wow I am impressed. Currently, Neo4j supports vector operations, more specifically, similarity calculations. It would be great if we could extend the concept of fixed-length lists and perform similarity operations. Maybe that support already exists and I am overlooking how to achieve this with your stack, but this would be a great feature to help support RAG operations.