Closed var77 closed 1 year ago
Wouldn't this mean that if there is an ORDER BY <-> but the index is not triggered, e.g., because the cost estimator determined that it would be more cost effective to do a sequential search, then an error would be thrown?
It's a bit strange to me that I can't do <-> in SELECT. I did this a few times when testing pgvector to sanity check that I was getting the nearest vectors.
Yes that is true, it will throw an error in case of sequential scans, but here is some backstory why we decided to implement this feature.
We wanted to make the UX better by maintaining only one operator which will automatically determine which kind of distance function should be called. Currently in alternative solutions there is an separate operator for each distance function (e.g <->, <=>, <~>) so you should remember which operator to use for particular index.
Currently on index scans we can determine the right distance function, as we can look up the support function from the index Relation, but in case of sequential scan or SELECT
statement we couldn't find an optimal way to determine which distance function to call. In case of SELECT
I think it is not possible, but in case of seq scan there might be a way to look up the indexes defined on that field and try to determine the right distance function based on that information, but we might loose some performance there.
So at this moment we decided to forbid the usage of the operator, so it will not be confusing that let's say sometimes the operator returns l2
distance if used out of index and when used with index it returns cosine
distance.
We can discuss this topic further with @Ngalstyan4 , maybe we can come up with a better solution.
Currently we have only one operator
<->
In
src/hnsw/options.c
file there's a functionHnswGetMetricKind
which will determine current operator class being used with and index and detect right metric kind forusearch
index using the support function pointers.This is great as we can have only one operator which will support various distance functions, but when used out of index scope for example in
SELECT
statement, the operator can not automatically detect which distance function should be used.We are currently throwing an error when
<->
is used out of index lookup. We are doing this usingExecutorStart_hook
the hook implementation is defined insrc/hnsw/options.c
void executor_hook
. This function receives QueryDesc struct, and we are currently doing regexp matching onsourceText
. This approach is not covering cases when the operator will be used withORDER BY
, but there won't be an index scan.To fix all the cases we might use plannedstmt which contains the AST of planned statement, where we can find information about the indexes and much more.
After doing this changes theres
hnsw_operators_todo.sql
test file. The file should be renamed tohnsw_operators.sql
and included inschedule.txt
file