opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
56 stars 58 forks source link

[Feature] Enable Sorting in Hybrid Search #768

Closed vibrantvarun closed 2 weeks ago

vibrantvarun commented 1 month ago

Description

This PR is to enable sorting in Hybrid Search. The following key areas which are covered by this PR

  1. Enable sorting on single field and multi-field.
  2. Enable search_after pagination support with sorting. The reason why search_after is included in this PR because without this the whole sorting feature is half baked. This feature exclusively works with sorting.
    • The same methods which we have created for sorting are used by search_after. Therefore, the code can be more optimized when this feature is added with this PR.
  3. Block track_scores when sorting is applied
  4. Block the scenario when user tries to sort by field and _score both.

Order to review the PR

  1. HybridCollectorManager.createHybridCollectorManager method
  2. HybridCollectorManager.newCollector method
  3. HybridTopFieldDocSortCollector.java
  4. MultiLeafFieldComparator.java which is directly copy of from Lucene. The reason why we copied this class in neural search because this class is final in Lucene.
  5. HybridCollectorManager.reduce method
  6. NormarlizationProcessorWorkFlow
  7. CompoundTopDocs
  8. ScoreCombiner
  9. HybridQueryResultUtil
  10. Test Classes

Issues Resolved

507

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuye-aws commented 1 month ago

Why are we blocking track_scores when sorting is applied. According to track_scores, users can still set track_scores to be true when sorting is applied.