opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
57 stars 58 forks source link

In hybrid query optimize the way we iterate over results and collect scores of sub queries #745

Closed martin-gaievski closed 1 month ago

martin-gaievski commented 1 month ago

As part of performance optimization for Hybrid query we need to find a way to minimize time taken by getting next matching doc for sun query and collection sun query scores. As per following information collected during profiling these calls take ~85% of CPU time.

327627242-9901f27b-3399-4a21-a543-6a21e541a1fc (1)

As a baseline we're taking results from previous PR related to hybrid query optimization, those are based on 2.13 version and noaa OSB workload, all times are in ms:

One sub-query that selects 11M documents

Bool: p50 77.8893 | p90 78.1916
Hybrid: p50 186.709 | p90 197.739

One sub-query that selects 1.6K documents

Bool: p50 71.0947 | p90 71.691
Hybrid: p50 71.5156 | p90 72.8801

Three sub-query that select 15M documents

Bool: p50 87.0556 | p90 90.9105
Hybrid: p50 287.255 | p90 313.868

The current logic of iterating over doc and collecting scores is following: