opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.87k stars 1.84k forks source link

[Feature Request] Tie breaker for search with search_after pagination #11831

Open Arpit-Bandejiya opened 10 months ago

Arpit-Bandejiya commented 10 months ago

Is your feature request related to a problem? Please describe

When we do the sorting by datetime and have the recurrent values.

#1 "2024-01-03 19:57:38"
#2 "2024-01-03 19:57:38"
...
#3 "2024-01-04 19:57:39"
#4 "2024-01-04 19:57:39"

This makes leaks of the docs while paginating using search_after parameter. According to the dataset above imagine first 10K docs ends with #1 value, so the next 10K will start with #3. #2 is missed.

This feature is requested by other users as well: https://stackoverflow.com/questions/76042569/can-i-imitate-a-tie-breaker-field-in-opensearch-with-search-after-pagination

Describe the solution you'd like

We need to introduce an default tie_breaker_fields for the PIT with search_after.

Related component

Search:Query Capabilities

Describe alternatives you've considered

No response

Additional context

No response

msfroh commented 10 months ago

@Arpit-Bandejiya -- Does this work if you sort by timestamp and _id, then search_after with both? That should provide a unique sort, right?

In theory, I suppose we could add _id as an implicit tie breaker.

bharath-techie commented 9 months ago

Hi @msfroh , By default, many users don't seem to index '_id' as a different doc values field. So , they get loaded as field data and has a impact on heap usage.