opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
357 stars 176 forks source link

[FEATURE] add support for search(search_after) #337

Open r-mendes opened 1 year ago

r-mendes commented 1 year ago

What is the bug?

The search method doesn't support search_after parameter.

How can one reproduce the bug?

  1. Instantiate a OpenSearch object;
  2. Run a search using the sort parameter;
  3. Get the result and then look for the last hit entry;
  4. From the last hit entry, fetch the sort field value;
  5. Run a search passing the value fetched in step 4 in the searh_after parameter;
  6. It will be shown a messaging informing that this parameter is not supported;

What is the expected behavior?

The search_after parameter supposed to be supported in the most recent version of the library.

What is your host/environment?

Linux, Ubuntu 22.04

Do you have any screenshots?

Do you have any additional context?

The lack of this parameter block the usage of queries' pagination using the search_after and Point-in-time approach.

margulanz commented 1 year ago

I will try to work on this issue

ReinGrad commented 1 year ago

The OpenSearch library currently does not support thesearch_after parameter, which is used for pagination in search queries. This means that users cannot easily paginate large result sets using the search_after parameter in OpenSearch.

It would be possible that the OpenSearch library should support the search_after parameter, since this is a common feature in other search libraries such as ElasticSearch. It is possible that this feature may be added in a future version of the library.

One of the options may be switching to another search library that supports the search_after parameter, or implementing a different pagination strategy using the from and size parameters. It may also be possible to contribute to the OpenSearch library to add support for the search_afterparameter.

margulanz commented 1 year ago

As I understand, this issue may be considered as closed? @ReinGrad?

morrissimo commented 1 year ago

FYI I've been able to use the .extra() capability as a good-enough workaround to use search_after - eg

# initial query
>>> query.to_dict()
{
  "query": {
    "match_all": {}   # or whatever
  },
  "from": 0,
  "size": 1,
  "sort": [
    "field1.keyword",
    "field2.keyword"
  ]
}
# define search_after spec
>>> search_after_spec = [
    "some_field1_value",
    "some_field2_value"
]
# apply search_after spec to existing query
>>> query = query.extra(search_after=search_after_spec)
# query with search_after added
>>> query.to_dict()
{
  "query": {
    "match_all": {}   # or whatever
  },
  "from": 0,
  "size": 1,
  "sort": [
    "field1.keyword",
    "field2.keyword"
  ],
  "search_after": [
    "some_field1_value",
    "some_field2_value"
  ]
}

FWIW, I've also been using this same .extra() approach to use point-in-time with search_after

wbeckler commented 1 year ago

If anyone wants to add this as a feature, I think it still makes sense. Or if anyone wants to update the user guide examples for now.

dblock commented 1 year ago

@r-mendes or @morrissimo Any interest in picking this up?

dblock commented 1 year ago

Also, is https://github.com/opensearch-project/opensearch-py/blob/d8dc5474b7e7e2b443d9858c21d8f7be93306704/guides/search.md?plain=1#L140C37-L140C49 not the way to do this, meaning we still need different support for it?