Closed loretoparisi closed 9 months ago
The following SQL syntax is proposed for the new field:
<field name>
float_vector
[knn_type='hnsw'
knn_dims='int'
knn_similarity={l2|ip|cosine}
[hnsw_m='int']
[hnsw_ef_construction='int']
]
knn_type
is not mandatory. If no knn*
is specified, the field remains just an array of floatsknn_type
gets turns on automatically if knn_similarity
or knn_dims
is specified. The default is hnsw
.knn_dims
and knn_similarity
are required if knn_type='hnsw'
hnsw_m
and hnsw_ef_construction
are optionalExamples:
create table t(a float_vector)
- just an array of floatscreate table t(a float_vector knn_dims='128' knn_similarity='l2')
- simplest syntax to enable knncreate table t(a float_vector knn_type='hnsw' knn_dims='128' knn_similarity='l2')
- alternative syntax mostly for the future when knn_type can be e.g. annoy
create table t(a float_vector knn_type='hnsw' knn_dims='16' knn_similarity='ip' hnsw_m='16')
- fine-tuningcreate table t(a float_vector knn_type='hnsw' knn_dims='16' knn_similarity='ip' hnsw_m='20' hnsw_ef_construction='90')
- more fine-tuning@glookka pls review and let me know if it looks good or if I'm missing something and there are better options.
knn_similarity={l2|ip|cosine}
option is specific to HNSW. E.g. annoy
has "angular", "euclidean", "manhattan", "hamming", or "dot". So it probably makes sense to name the option hnsw_similarity
.
Is your feature request related to a problem? Please describe. Indexing vectors of embeddings along with the document. Optionally supporto multi-vector per document and retrieval.
Describe the solution you'd like Add HNSW as vector similarity search
Describe alternatives you've considered
Additional context Semantic and Similarity Search integration to keyword based search.