[PROPOSAL] Optimize max score calculation in Hybrid Search

vibrantvarun commented 4 months ago

What/Why

What are you proposing?

The max score calculation per shard in the Query Phase of hybrid search is done by iterating over the score-docs (results), instead we can just keep a track of max score at the collector level and update its value when the score is calculated.

Current logic is

private float getMaxScore(final List<TopDocs> topDocs) {
        if (topDocs.isEmpty()) {
            return 0.0f;
        } else {
            return topDocs.stream()
                .map(docs -> docs.scoreDocs.length == 0 ? new ScoreDoc(-1, 0.0f) : docs.scoreDocs[0])
                .map(scoreDoc -> scoreDoc.score)
                .max(Float::compare)
                .get();
        }
    }

where topDocs represent the result found from different subqueries in that particular shard.

Instead we can add a variable

private int maxScore=0.0f;

and later in the collect method where hybrid scores are calculated.

Add a condition in here

if(score>maxScore){
   maxScore=score;
}

By this way we will reduce the latency by

(Number of Shards) x (Number of topDocs found on each shard)

What problems are you trying to solve?

Reducing the latency in Hybrid Search

What is the developer experience going to be?

No changes in the developer experience.

Are there any security considerations?

Nope

Are there any breaking changes to the API

No breaking changes

martin-gaievski commented 4 months ago

It's good idea @vibrantvarun, we're calculating maxScore at the coordinator level today, we should be able to skip same computation at the shard level. In your PR https://github.com/opensearch-project/neural-search/pull/765 you changed the logic at the shard level, not skipped it. Can you clarify why we kept the logic at shard level?

vibrantvarun commented 4 months ago

It's good idea @vibrantvarun, we're calculating maxScore at the coordinator level today, we should be able to skip same computation at the shard level. In your PR #765 you changed the logic at the shard level, not skipped it. Can you clarify why we kept the logic at shard level?

Currently, with my change We are now not calculating the max score. We are just tracking it by finding the max score amongst the what has already been computed.

I don't want to keep it 0.0 because if someone just hits hybrid query without normalization then atleast the results should have some sanity. On top of that, I do not have to do any extra computation to calculate the max score. (I consider comparison of scores between each other by current logic is done in constant time computation so we can ignore that)

martin-gaievski commented 4 months ago

It's good idea @vibrantvarun, we're calculating maxScore at the coordinator level today, we should be able to skip same computation at the shard level. In your PR #765 you changed the logic at the shard level, not skipped it. Can you clarify why we kept the logic at shard level?

Currently, with my change We are now not calculating the max score. We are just tracking it by finding the max score amongst the what has already been computed.

I don't want to keep it 0.0 because if someone just hits hybrid query without normalization then atleast the results should have some sanity. On top of that, I do not have to do any extra computation to calculate the max score. (I consider comparison of scores between each other by current logic is done in constant time computation so we can ignore that)

that constant time for score comparison is for each hit, for each search request it's num_of_sub_query_hits. If we can save some CPU cycles we need to add it, that falls under umbrella of performance optimization. I would not care about results of hybrid query without normalization processor, that is not the right setup and we should not support it. Unless we're not failing any response is fine.

opensearch-project / neural-search