opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
59 stars 60 forks source link

[FEATURE] Implement pagination for Hybrid Search #280

Open martin-gaievski opened 1 year ago

martin-gaievski commented 1 year ago

Is your feature request related to a problem?

Current implementation of Hybrid search doesn't have support for pagination, meaning all results are returned "at once". That is standard for many queries in OpenSearch and it's expected that Hybrid Search supports it, https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/.

What solution would you like?

It should be possible to define "from" and positions in as part of the Hybrid query, and results should have (from + size) number of records, starting from "from" position. Standard syntax should be fine in this case:

{
   "from": 20,
   "size": 10,
   "query": {
       "hybrid": [
           {},// First Query
           {} // Second Query
           ..... // Other Queries
       ] 
   }
}

What alternatives have you considered?

It's possible to set higher size and then throw first X elements, but that is required extra processing logic on a client size and is not very optimal.

ankitas3 commented 5 months ago

When can we expect this in OpenSearch?

jackh-ncl commented 4 months ago

@martin-gaievski @ankitas3 Just double checking, but from and size already seem to be working for me with Hybrid search while using Opensearch 2.11. Has it been implemented since this issue was raised?

martin-gaievski commented 4 months ago

@jackh-ncl it has not been implemented yet. With existing code it may work, but not for every scenario and when it works it does it not in an optimal way.

jackh-ncl commented 4 months ago

Aha thank you for confirming, yeah I have now noticed that I seem to consistently end up with a total hits value of 5 * the size parameter with the query I'm running.

benmcginnis commented 4 months ago

any chance we can get this in 2.14 or the version after that?

brandon-carag commented 4 months ago

Is anyone from opensearch able to provide any broad details around when pagination will be made available?

qmauret commented 3 months ago

+1 for this feature request

mkerimyilmaz commented 3 months ago

@martin-gaievski Can you please clarify why it doesn't work optimally?

@jackh-ncl it has not been implemented yet. With existing code it may work, but not for every scenario and when it works it does it not in an optimal way.

JPSoteloSilva commented 3 months ago

@martin-gaievski Any idea on when this feature will be available? Thanks

brandon-carag commented 1 month ago

Hi @vamshin and @martin-gaievski,

Just to briefly restate the case here, pagination is a critical feature for many users adopting hybrid search. There has been significant community interest in this thread (at least 20 distinct users) that also point to strong user demand.

I'm sure there are several competing dev priorities here, but this seems like a core item that shouldn't fall to the wayside. At the very least, can we get a very rough approximate timeline here? I suspect there are many people on this thread whose downstream roadmaps are reliant on what happens here.

Thanks in advance

vamshin commented 1 month ago

@brandon-carag Thank you for your interest in pagination functionality. We understand the importance of this feature, and it's on our roadmap. However, at the moment, we're focusing our efforts on enhancing the sorting and explain (raw scores for debugging) capabilities.

While we don't have a definite timeline for implementing pagination, we estimate it could be available towards the end of the year. Please note that this is a rough estimate, and the actual timeline may vary depending on our priorities and resource availability.

We're always open to collaboration and would be delighted if someone from the community is willing to contribute to this feature. If you or anyone else is interested in contributing, please feel free to reach out to us. We can provide guidance and support to ensure a smooth integration of the pagination functionality.

As a work around, is it possible to use a "size" parameter with a large value? I think we can get upto 10K results

brandon-carag commented 1 month ago

@vamshin Thanks for the prompt response--that's helpful info, the rough timeline you mentioned is useful to know. I believe martin mentioned above "it has not been implemented yet. With existing code it may work, but not for every scenario and when it works it does it not in an optimal way." I'm not exactly clear on the boundaries where the existing pagination logic breaks down in the existing implementation.

For even a relatively small corpus, it seems like specifying a large size and processing client-side won't scale for even a relatively small index, particularly if OpenSearch is being used to service API requests and apply pagination. As such, hoping this feature can emerge sooner rather than later.

sonic182 commented 2 weeks ago

has anyone tried scroll api as an alternative? it works?

brandon-carag commented 1 week ago

My understanding is that the scroll API won't solve this issue. Documentation states, "Because search contexts consume a lot of memory, we suggest you don’t use the scroll operation for frequent user queries. Instead, use the sort parameter with the search_after parameter to scroll responses for user queries." https://opensearch.org/docs/latest/api-reference/scroll/