[BUG] msearch hangs when dealing with a high number of records.

What is the bug?

Im running a search job on a big batch file (900K records). as such, im using multisearch. The cluster has 3 data nodes and 3 master nodes.

I split the records in batches. The weird thing is, if i run batches of 5000 records. the job takes around 200 seconds to process. monitoring aws metrics show no apparent issue with memory/cpu on any of the nodes.

However, if i use 10000 records for the msearch command, something strange happens.

For a while the cluster is performing the search operations, i can see there are active/queued on the threadpool api endpoint /_cat/thread_pool/search . However, after a certain point, there are no more active/queue/rejected threads on the threadpool, but the python msearch call just hangs , and it hangs around for ever. I have to kill the jupyter kernel to make it work.

How can one reproduce the bug?

Cant share the data im using unfortunately, and the data used for search is correlated with the number of records that make the search hang.

But in a nutshell, running this

msearch_result = search_client.msearch(
        msearch_query, 
    )

with a high volume of records makes the job crash, not on the Opensearch side, but on the python client side.

It is important to note that the records are querying any of the 50 or so indices we have, so not all records on the msearch call go to the same index.

However, using the requests library directly (with the aws-auth library for authentication) works perfectly.

#this works with no problem
resp = requests.post( 'https://'+endpoint+'/_msearch', data=msearch_query, headers={'Content-Type': 'application/json'}, timeout=500)

What is the expected behavior?

python client should handle the request, or if the return body from the multisearch operation is too big, raise an appropriate exception

What is your host/environment?

opensearchpy 2.2.0

OS: ProductName: macOS ProductVersion: 14.0 BuildVersion: 23A344

opensearch-project / opensearch-py