Open farshidz opened 7 months ago
Vespa supports this today using tensor compute expressions but not in the context of the nearestNeighbor query operator or HNSW indexing. Maybe you could elaborate on why you need it for retrieval? And illustrative use case would help.
We index vectors that are labeled. At search time, we need to retrieve only vectors with one or more specific labels. While this could be handled by creating a tensor field for each label, the full set of labels isn't known in advance, so we have to rely on a generic tensor field and use a mapped dimension for the label. Even though vectors for different labels do not follow the exact same distribution, in practice we have seen good recall with this approach with HNSW.
Hi @jobergum ! We haven't yet found a solution for the problem @farshidz is describing. We may have some capacity at some point to work on contributing this feature to Vespa. If we go down this route are there any tips to get started, or resources to point us towards?
This is fully supported with tensor compute expressions but not HNSW indexing for efficient retrieval. So, if you can limit it to ranking phases, the functionality is there.
I would say that this is a very complex task for someone without a deep knowledge of the code base.
Thanks @jobergum . Having this functionality at retrieval-time is key for our use case. Any estimate when this can become available? Or anything we can do from our end to help this get done?
The Vespa core work needed is expert level so probably not suitable for external contributions (although you're welcome to assess this yourself - code and build instructions are on GitHub).
We do plan to get to this at some point but no ETA currently. If you are a paying customer you can create a support ticket instead and we'll set an ETA.
Workaround: Add the labels to a string array in the document in addition to the tensor, and filter on that.
Is your feature request related to a problem? Please describe. We have tensor fields with two mapped dimensions and one indexed dimension
tensor<float>(p{}, q{}, x[384])
. We would like to be able to search within one or more specificp
dimension values.Describe the solution you'd like The nearest neighbor search operator to support one or more values for a mapped dimension. e.g.,
so that the search will only considers vectors where mapped dimension
p
has one of the valuesvalue1
,value2
orvalue3
.Describe alternatives you've considered Alternatively the nearest neighbor operator can accept only a single value for the mapped dimension, and for multiple values the query will have to consist of a disjunction of multiple nearest neighbor operators.
Additional context N/A