opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
367 stars 182 forks source link

[FEATURE] Allow hybrid search querying within the DSL #630

Open youcandanch opened 11 months ago

youcandanch commented 11 months ago

Is your feature request related to a problem?

OpenSearch 2.10 introduced the ability to do hybrid querying, where you can use multiple queries and normalize the resulting scores to implement a hybrid search strategy. Low-level client totally supports this, and the high-level client should be able to support it with something like this:

plaintext_query = "hello this is dog"
vector_embeddings = [-0.018416348844766617, -0.02228226698935032, 0.016061007976531982, ...etc]
lexical_search_params = {
    "multi_match": {
        "query": plaintext_query,
        "fields": ["title", "text"],
        "type": "phrase",
    }
}
semantic_search_params = {
    "script_score": {
        "query": {"bool": {}},
        "script": {
            "source": "knn_score",
            "lang": "knn",
            "params": {
                "field": "vectors",
                "query_value": vector_embeddings,
                "space_type": "cosinesimil",
            },
        },
    }
}
search = Search().query(Q({
    'hybrid': {
        'queries': [
            lexical_search_params,
            semantic_search_params
        ]
    }
}))

Not the prettiest code, but it should work! Problem is upon constructing that query, the following error is raised:

  File "hybrid_search_test.py", line 52, in get
    search = Search().query(Q({
  File "lib/python3.8/site-packages/opensearchpy/helpers/query.py", line 49, in Q
    return Query.get_dsl_class(name)(_expand__to_dot=False, **params)
  File "lib/python3.8/site-packages/opensearchpy/helpers/utils.py", line 283, in get_dsl_class
    raise UnknownDslObject(

opensearchpy.exceptions.UnknownDslObject: DSL class `hybrid` does not exist in query.

This prevents hybrid searching via the DSL, and forces dropping down to the low-level client to execute.

What solution would you like?

Within helpers/query.py, adding:

class Hybrid(Query):
    name = "hybrid"

...works with my example above, but I'm not 100% sure it's the right way to broach adding this to the DSL. I think something like a Search.hybrid_query function might make sense, but haven't dug down that rabbit hole yet.

lambda-science commented 9 months ago

Intesrested in this

saimedhi commented 9 months ago

Please feel free to contribute. Thank you!

ssharm8-etr commented 8 months ago

interested in this

saimedhi commented 8 months ago

@youcandanch, @lambda-science, @ssharm8-etr Your contributions are highly valued and greatly appreciated. Whenever you have a moment, we welcome your input and encourage you to submit a pull request. Thank you!