Closed clashofphish closed 5 months ago
Hello @clashofphish, I will try to replicate this case in a VPC environment. In the meantime, if you find the cause or bug, please feel free to contribute. Thank you!
@clashofphish Is this in Amazon Managed OpenSearch? Do you have this reproduced with curl
or awscurl
so we can see if the problem is the client or the server?
@dblock This is the Amazon Managed OpenSearch. When I use curl I don't get the same error. Only when I attempt to use the SDK.
Let me know if you need more information.
@clashofphish This is helpful. Will you please post the working curl(s)?
I think the whole VPC business is a red herring. I'd start by removing the content type from your python code because bulk is ld-json, not json. Next I'd dig through the code to see exactly what's being sent up in the python client and received back and compare to the curl i/o.
I'll get the curl info for you.
In the mean time, I can tell you that when I turned logging on, the log messages from OpenSearch show that OS is getting the records and writing them correctly. Also, the count of records increases. It's just that the response object is a string rather than a json object.
The log message:
Also, I tried the code without the "Content-Type" specified in the header and had the exact same issue.
It's just that the response object is a string rather than a json object.
That is saying that the content type of the result is not evaluated properly, so needs to be debugged.
It's just that the response object is a string rather than a json object.
That is saying that the content type of the result is not evaluated properly, so needs to be debugged.
Agreed. The SDK does not evaluate the resulting response of the request to the endpoint correctly when I do my authorization using the header rather than http_auth parameter. Because it does not evaluate that result correctly it errors in the _process_bulk_chunk_success
function.
Is there another way to tell the SDK to how to parse the result object that I'm missing?
Or am I not understanding what you are trying to say correctly?
Or am I not understanding what you are trying to say correctly?
I'm just saying it's not supposed to happen this way. It should "just work" (TM). So there's a bug somewhere :) Since you have a way to reproduce I am hoping you'll narrow it down by walking through the code ;)
Ideally, turn this into a failing unit test? I can try to fix from there.
I can't get the bulk curl request to work because it keeps giving me an error about having to end in a newline when I clearly have a newline in my call (I also tried having the data in a json file and using @reqs.json after --data-raw) --
curl -X POST --location 'https://<url>/test-index/_bulk' --header 'Authorization: <base64key>' --header 'Content-Type: application/json' --data-raw '{ "index": { "_index": "test-index", "_id": "1" } }\n{"id": "1", "text": "bob", "metadata": {"noticeId": "c7c, "department": "HOUSING"}}\n{ "index": { "_index": "test-index", "_id": "2" } }\n{"id": "2", "text": "jane", "metadata": {"noticeId": "6e9", "department": "HOUSING"}}\n'
I can tell you that my co-workers have been able to successfully make fetch
calls to push documents to the index --
const request = await fetch('https://<url>/_bulk', {
body: batch.map(JSON.stringify).join('\n') + '\n',
method: 'POST',
headers: {
'Authorization': <token here>,
'Content-Type': 'application/json; boundary=NL',
},
} )
I can also tell you that I know the request to client.bulk()
that the bulk
helper performs is working because my documents end up in my index. It's just that the response is a sting so it causes the post-processing of the response to fail. This only happens when I use the header to specify my authentication token for OS behind the VPC. It does not happen to the OS when I use http_auth
with an OS instance not behind a VPC. From what I can see the calls to OS are the same in both instances.
I don't know what to do from here. I'm happy to provide more, but I need guidance on what you need.
This ticket can be closed. I narrowed the error down to the way the API Gateway and VPC where built. Sorry for the mix up. Thanks for your help regardless.
This ticket can be closed. I narrowed the error down to the way the API Gateway and VPC where built. Sorry for the mix up. Thanks for your help regardless.
I'm glad you fixed the issue. Could you help understand what the root problem/cause was here and how you figured it out?
The problem was that the API Gateway was configured incorrectly. The lesson is that even when you trust your coworkers, sometimes you still have to double check their work. The Gateway was setup such that it was stringifying the response object inside of a stringified object.
I figured this out by poking at my coworker for more help. It's partially my fault for being ignorant of how API Gateways works/was setup in this instance.
The Gateway was setup such that it was stringifying the response object inside of a stringified object.
I mean how was it setup to enable this behavior?
What is the bug?
When I set up an OpenSearch.client using the header for authentication and attempt to use the bulk helper (opensearchpy.helpers.bulk) the
client.bulk()
call in_process_bulk_chunk
returns a string, which causes the_process_bulk_chunk_success
function to raise a TypeError whenresp['index']
is called at line 185 in opensearchpy/helpers/actions.py.I tried this with my header defining "Content-Type" as "application/json" and as "application/json; boundary=NL".
How can one reproduce the bug?
What is the expected behavior?
I would expect that the response is a json object that can be indexed using a string key.
What is your host/environment?
Do you have any screenshots?
Do you have any additional context?
Oddly enough this behavior does not happen on the OpenSearch domain that I deployed outside of the VPC when I use the
http_auth=(username, password)
parameter.