opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
56 stars 58 forks source link

Add inner hits support to hybrid query #776

Open martin-gaievski opened 3 weeks ago

martin-gaievski commented 3 weeks ago

Description

Adding support for inner hits to hybrid query. This is a feature of OpenSearch that is available for other queries but was not supported by hybrid query.

Inner hits will be tracked similarly to how they are tracked for all other queries. They will contain details of inner hits for cases of nested fields and parent/child relationships between documents. The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

Following are example of response that contains such inner hits section for nested and parent/child queries:

{
    "took": 79,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.540445,
        "hits": [
            {
                "_index": "index-test",
                "_id": "Sogqp48BjNYyAI8a4z9u",
                "_score": 1.540445,
                "_source": {
                    "doc_price": 100,
                    "doc_index": 4976,
                    "doc_location": {
                        "coordinates": [
                            [
                                -111.15,
                                45.12
                            ],
                            [
                                -109.83,
                                44.12
                            ]
                        ],
                        "type": "envelope"
                    },
                    "doc_location_2": "81.15, 44.12",
                    "doc_date": "02/03/2014",
                    "doc_point": {
                        "lon": 74.0,
                        "lat": 40.71
                    },
                    "id": "7ebe00c8-9858-11ee-b9d1-0242ac120002",
                    "doc_keyword": "workable",
                    "category": "permission",
                    "title": "Writing a list of random sentences is harder than I initially thought it would be.",
                    "user": {
                        "firstname": "john",
                        "age": 1,
                        "lastname": "black"
                    }
                },
                "inner_hits": {
                    "user": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.540445,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "Sogqp48BjNYyAI8a4z9u",
                                    "_nested": {
                                        "field": "user",
                                        "offset": 0
                                    },
                                    "_score": 1.540445,
                                    "_source": {
                                        "firstname": "john",
                                        "age": 1,
                                        "lastname": "black"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

{
    "took": 134,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "index-test",
                "_id": "10",
                "_score": 1.0,
                "_routing": "1",
                "_source": {
                    "my_id": "10",
                    "text": "This is an answer",
                    "my_join_field": {
                        "name": "answer",
                        "parent": "5"
                    }
                },
                "inner_hits": {
                    "question": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.2039728,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "5",
                                    "_score": 1.2039728,
                                    "_source": {
                                        "my_id": "5",
                                        "text": "This is a question",
                                        "my_join_field": "question"
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index": "index-test",
                "_id": "11",
                "_score": 1.0,
                "_routing": "1",
                "_source": {
                    "my_id": "11",
                    "text": "This is second answer",
                    "my_join_field": {
                        "name": "answer",
                        "parent": "5"
                    }
                },
                "inner_hits": {
                    "question": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.2039728,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "5",
                                    "_score": 1.2039728,
                                    "_source": {
                                        "my_id": "5",
                                        "text": "This is a question",
                                        "my_join_field": "question"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Issues Resolved

https://github.com/opensearch-project/neural-search/issues/718

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 3 weeks ago

The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

@martin-gaievski do we have a path forward for resolving this technical challenge? Have we started the discussion around this with Core team.

Also, I would like to take a step back here and question what is the meaning of normalized score for inner hits?

navneet1v commented 3 weeks ago

@martin-gaievski is this feature scoped for 2.15?

martin-gaievski commented 3 weeks ago

@martin-gaievski is this feature scoped for 2.15?

There is no hard requirement for the version

martin-gaievski commented 3 weeks ago

The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

@martin-gaievski do we have a path forward for resolving this technical challenge? Have we started the discussion around this with Core team.

Also, I would like to take a step back here and question what is the meaning of normalized score for inner hits?

For now no path clear forward, I'll be working on summarizing technical hurdles we do have. Short list is: