opensearch-project / opensearch-py

Python Client for OpenSearch
https://opensearch.org/docs/latest/clients/python/
Apache License 2.0
338 stars 170 forks source link

[BUG] Using scan helper method doesn't return all results #562

Closed narquette closed 11 months ago

narquette commented 11 months ago

What is the bug?

When I try to use the scan helper, I'm not able to get all of the search results as it returns an error.

How can one reproduce the bug?

Run the code below with you're updated information (e.g. hostname for open search, profile name for boto3)

import boto3
from opensearchpy import OpenSearch, AWSV4SignerAuth, RequestsHttpConnection
from opensearchpy.helpers import scan

host = '<your_host_name>'
sso_profile = '<your_sso_profile_name>'

sess = boto3.Session(profile_name=sso_profile)
region = sess.region_name
credentials = sess.get_credentials()
auth = AWSV4SignerAuth(credentials, region, 'aoss')

open_search_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=False,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20
)

query = {
  "query": {
    "match_all": {}
  }
}

results = scan(open_search_client, query=query, index='medications')

res_list = []
for result in results:
    print(result['_id'])

What is the expected behavior?

I should be able to return all of the results from the sample above.

What is your host/environment?

OS Version: Windows 10 Enterprise Python Version: 3.10

Library Versions:

Do you have any screenshots?

Traceback (most recent call last):
  File "C:\Users\nicholas.arquette\PycharmProjects\Trove\open_search\testing.py", line 16, in <module>
    for result in results:
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\helpers\actions.py", line 598, in scan
    resp = client.scroll(
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\client\utils.py", line 179, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\client\__init__.py", line 1398, in scroll
    return self.transport.perform_request(
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\transport.py", line 409, in perform_request
    raise e
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\transport.py", line 370, in perform_request
    status, headers_response, data = connection.perform_request(
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\connection\http_requests.py", line 230, in perform_request
    self._raise_error(
  File "C:\Users\nicholas.arquette\AppData\Local\miniconda3\envs\ehrin\lib\site-packages\opensearchpy\connection\base.py", line 301, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.NotFoundError: NotFoundError(404, '')

Do you have any additional context?

I'm just trying to get all of the data returned so that I can parse the information and put it into a relational database.

narquette commented 11 months ago

Same issue happens when I try to use the scroll method as well.

dblock commented 11 months ago

What query is it making underneath that causes the 404? Turn this into a (failing) test?

narquette commented 11 months ago

Figured out the issue. I'm using aws opensearch serverlist and it isn't a support operation. For Amazon OpenSearch Serverless collections, search_after will be used because neither point_in_time nor scroll are supported by collections.

narquette commented 11 months ago

Not a supported request type for aws opensearch serverless.