opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
117 stars 135 forks source link

[FEATURE] Add K-NN vector and query support to SQL Plugin #1615

Open acarbonetto opened 1 year ago

acarbonetto commented 1 year ago

Is your feature request related to a problem?

The k-nn, or k-nearest neighbours plugin enables users to define knn_vector objects and perform nearest-neighbour queries on that objects in OpenSearch.

What solution would you like?

We want to add the following features:

Questions:

Proposed PPL syntax

where <pre-filter expression> | where knn_score(field, vector, k) | where <post-filter expression> | top size

Proposed SQL Syntax:

Option 1) use OpenSearch functions to call knn in the ORDER BY clause.

WHERE <pre-filter expression> ORDER BY knn(field, vector, k, <post-filter expression>) LIMIT size
WHERE <pre-filter expression> AND knn_score(field, vector, k) AND <post-filter expression> LIMIT size

Option 2) USe pgvector syntax to map to OpenSearch DSL functions

see: pgvector

Do you have any additional context?

reference:

Proposed Schedule

derek-ho commented 1 year ago

@penghuo @dai-chen is this on the roadmap/priority?

brijos commented 1 year ago

Instead of making PGVector a follow up, let's make sure it gets into the initial release even if that means pushing the release out.

acarbonetto commented 1 year ago

@brijos Thanks for your comment. Does this also block a PPL release?

brijos commented 1 year ago

Thanks @acarbonetto. Let's not block on the PPL work if that is ready sooner. From a release announcement perspective, we'll announce them at the same time.

acarbonetto commented 1 year ago

Perfect. I'll update the proposal to do a PPL release, SQL release with pgvector, then it will be marked completed.

bugmakerrrrrr commented 3 months ago

Hi @acarbonetto , may I ask is there any progress on this issue? I am currently working on developing this feature and have already created a prototype that supports basic k-NN query. I would be happy to contribute to this feature if needed. Additionally, I have a few questions regarding this issue.

It seems simpler to use the where clause in PPL (with multiple `where clauses) to denote pre- and post- filter processing of the knn function. This makes less sense in SQL - and we may consider using the window functions or ordering the WHERE clause on purpose.

Could I kindly ask why you believe placing the knn function in the WHERE clause is illogical? IMHO, knn query and full-text quries are similar in OpenSearch, both return only highly rated documents for the user, we don't need to put it in order by clause.

WHERE ORDER BY knn(field, vector, k, ) LIMIT size

AFAIW, the filter used in KNN queries is applied during the k-NN search. It is a pre-filter, not a post-filter. The post-filter can be achieved by boolean query.

aparajita31pandey commented 2 months ago

Hey Folks We have create our in-house SQL which serves many search use cases . We have introduced SQL for semantic search too. Syntax: KNN (fieldName, EmbeddingList= [], K, Prefilters?) wherein PreFilters are optional and can be any type of lexical expression combined that can by logical operators Like "AND"/ "OR" Example- SELECT uuid WHERE KNN (field1, [1.0, 4.0, 6.3, 8.1], 5, country = "INDIA" AND rating > 3.4)

Since it is already been used at Uber Scale.