Open starinacool opened 1 week ago
Doesn't joining another table work in your case? https://manual.manticoresearch.com/Searching/Joining
If joining can lookup a vector from index A and find similar docs in table B then it will work. But I do not see how it could be done with joins.
and find similar docs in table B
similar to what?
Similar to the vector from index A.
The idea is to use vector from one index to search in another index.
Here's an example of knn + joined tables + ordering by knn + filtering by string in the joined table:
mysql> drop table if exists test; create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' ); insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) ); drop table if exists test2; create table test2 (color string); insert into test2 values(1, 'red'), (2, 'green'); select
id, title, knn_dist(), test2.color from test left join test2 on test2.id = test.id where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), 2000 ) and test2.color='red';
--------------
drop table if exists test
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
create table test ( title text, image_vector float_vector knn_type='hnsw' knn_dims='4' hnsw_similarity='l2' )
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
insert into test values ( 1, 'yellow bag', (0.653448,0.192478,0.017971,0.339821) ), ( 2, 'white bag', (-0.148894,0.748278,0.091892,-0.095406) )
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
drop table if exists test2
--------------
Query OK, 0 rows affected (0.03 sec)
--------------
create table test2 (color string)
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
insert into test2 values(1, 'red'), (2, 'green')
--------------
Query OK, 2 rows affected (0.00 sec)
--------------
select id, title, knn_dist(), test2.color from test left join test2 on test2.id = test.id where knn ( image_vector, 5, (0.286569,-0.031816,0.066684,0.032926), 2000 ) and test2.color='red'
--------------
+------+------------+------------+-------------+
| id | title | knn_dist() | test2.color |
+------+------------+------------+-------------+
| 1 | yellow bag | 0.28146550 | red |
+------+------------+------------+-------------+
1 row in set (0.00 sec)
--- 1 out of 1 results in 0ms ---
I will clarify.
The idea is to use vector from one index to search in another index by that vector.
So, both indexes has same dimension vectors.
create table test1 (id int, title text, vector float_vector knn_type='hnsw' knn_dims='4' )
create table test2 (id int, title text, vector float_vector knn_type='hnsw' knn_dims='4' )
The idea is to have a shortcut for these requests:
select vector as "VECTOR_FROM_1" from test1 where id=1234;
select id, title from test2 where knn ( vector, 5, (VECTOR_FROM_1), 2000 );
So instead of two requests combine it to one
select id, title from test2 where knn ( vector, 5, 1234 , test1.vector )
Thank you! The idea is more clear now. In your original spec you said:
<other table name>.<field> - is the notation of other index and field where to look up a vector for searching in <table name>
How are the <other table name>
and <table name>
related in your case? Can they be joined by id
or smth else?
No. Two tables consist of different data. The only thing in common is a vector fields with the same dimension.
In your spec
select ... from <table name> where knn ( <field>, <k>, <document id> [, <other table name>.<field>] )
is <document id>
meant to be a document id from <other table name>
or <table name>
?
id from <other table name>
Proposal:
In a query like this, it would be nice to specify the source index of vector:
In our case documents are spread in different indexes and I want to be able to find similar documents in other index then the source doc index.
Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.