Closed phoffer closed 3 months ago
Between the two requests, those that are working and those that aren't, are there differences between the requests in terms of the verb, the existence of query parameters, and the existence of a body? This will help debug this.
Even better, if you're willing to try, would be if you could create a minimal example using dummy data of the exact request that fails.
@phoffer I'm unable to reproduce the issue (even after switching the adapter to typhoeus
like you did). I wonder if it has anything to do with the **config
passed to the constructor. Can you also specify a PUT endpoint that causes this issue?
Sorry for delayed response, I was unexpectedly out of office a bit. To answer both of your questions:
PUT https://redacted.es.amazonaws.com:443/users_staging_index/_doc/31794768 [status:201, request:0.039s, query:n/a]
PUT https://redacted.es.amazonaws.com:443/users_staging_index/_doc/31794126 [status:200, request:0.040s, query:n/a]
PUT https://redacted.es.amazonaws.com:443/users_staging_index/_doc/31794098 [status:403, request:0.028s, query:N/A]
Produce an example app? I can't, as this only seems to pop up in our AWS hosted environments, and I don't have access to those services. I will go and ask, but we have pretty strict access policies.
**config
will just explode the config
hash values into the other hash, so in effect, this is
OpenSearch::Aws::Sigv4Client.new({
url: Rails.application.config.opensearch_url, port: URI.parse(Rails.application.config.opensearch_url).port,
adapter: :typhoeus, log: true
}, signer)
The variable gets used differently for non-AWS environments, which is why it's not all together like this snippet.
Additionally, this seems to still occur when we use the AWS signing, but with AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
variables instead of role_arn
strategy.
Thank you for that extra info! @nhtruong can you think of next steps for debugging this?
I'll start working on this later today.
We did some additional testing Friday and this morning and have a little more information we can provide.
We added user/pass authentication to the Opensearch cluster, using the same policies as the IAM user we had been attempting to use, and then used the Opensearch Client instead of the sigv4 wrapper client. This is how we initialized the client:
OpenSearch::Client.new(
host: Rails.application.config.opensearch_url,
user: ENV['OPENSEARCH_USER'],
password: ENV['OPENSEARCH_PASS'],
adapter: :typhoeus,
log: true
)
With this setup, everything worked as we expected. This makes me wonder/think there is something going on with the wrapping of the Opensearch client.
Is there any other info I can provide to help?
The sigv4 wrapper simply adds the Sigv4 headers to the request before sending it to AWS Cluster. It's only needed for services that require Sigv4 authentication. It looks like yours doesn't need sigv4 since you can use the regular client to communicate with the server?
Our goal is to use IAM roles to verify access and requests to our newer infrastructure pieces (Opensearch is the first we've added and tried to do this with). Our devops team has a preference of using IAM policies over adding user/pass access to services. It seemed that the sigv4 wrapper is required in order to do IAM role based authentication. Is that the case, or am I misunderstanding how these pieces come together?
Yes, to use IAM based auth, you will have to sign your requests with Sigv4, hence the Sigv4 client. The Internal Username/Password approach didn't get any sigv4 error be because that method of auth doesn't require sigv4 signing.
The fact that nearly identical requests can sometimes fail the Sigv4 auth is very puzzling to me. I have a hunch that it's not the client, because of the flakiness nature of this bug. I still couldn't replicate that error. Here what's I've tried:
I'm going to update the Sigv4 client with debugging capability (like printing out the canonical request). Then we can better help you with this issue ourselves or escalate it to AWS OS Service.
That debugging capability would be really helpful! I appreciate you tried those other aspects in debugging. I think we'll be able to get a lot more info with debugging updates in the client. Thank you!
@phoffer just released opensearch-aws-sigv4
1.2.0
with debug feature.
Instructions on how to turn it one can be found here: https://github.com/opensearch-project/opensearch-ruby/blob/a3c308389c5abdf71ccd50f00a6c8ececf9c7a6d/opensearch-aws-sigv4/USER_GUIDE.md#debugging
Here is an example with the debugging added:
ERROR:
{"message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
The Canonical String for this request should have been
'PUT
/products_staging_index/_doc/27625530
host:redacted.us-east-1.es.amazonaws.com
x-amz-content-sha256:9ae812aa93eb349823948c6b9612c8e33742c9e77b5c09e17936de952147459b
x-amz-date:20230228T224146Z
x-amz-security-token:IQoJb3JpZ2luX2VjENf//////////wEaCXVzLWVhc3QtMSJHMEUCIQDP4+i2bhcJrsfEneblkk2nctfHgYWZ3GJ0fzfa08n8zgIgUn/tmEbKhLLR0eb36CMC75Q85hxDfnjA5pR/ZoLRlZMqnwUIgP//////////ARACGgwzMTAyNDIyMTY5MzEiDN1z6aMoRwTmvkmdKirzBGw+r5rq3kKURuRWuuyb9G+67Y3nyW6dPn4jWlkVMA9bEmocNS9nhjaMm/p7TW7M5p+l9jl27QXwHeKSjiVCbYNkpKZLqCjdbgLn23d5AXO3hFKbR8k3Idw0pPqCZIstx1d5cwOHpJOGdcJGvljWFqRnw41k24TG75oltuswrkYDeIUlXMFDzrMc2H7PNGFHfPw2BCDAxLppMFT8Nk1s/9er6hOuR9K8hWc5CnmkAMowMgM6JuHuwnA2V0yoqX32NXd/NIJa1jQnzaTRpxYkwpU9FrmsAa4pSkKulSauRKqs9xZ0gDkD4BtTmoyYDhLPuOEKgPtfTnFst/Fglm4EwqORBLhkUX32Do2w4MS6S+j0S3ceuTg8nnMdYhbPo5xWhVwwVLTWpk6dj8lLurQbXw8iFd0el7Fv27pTHKCEOHQLR42rOWvXXn9Q+jtEeuWN5wyryVbW4PfQIMKNI1Vli4aEVfwdawKTcMdVwGkUKFxTMRIlTtyznR43qc7t8/oBrpE2o2UwXx8lQsqxOAEBLI0vH/CIwMUdBAcYeV4zt3K6t9sF2yeki9XWb9cspsBCUUkBYeUkEeiMvYUDKJ332g6Jvl1N1ttHS90nm7C8LQc6tCHXQoCpvTNxDl7JgeKNkofu8swVFY6QH8G6i/6RUrY7W8eBklHdhTad2H5741rv/UN1AtZNgB7atkY1jgdSmSLnfxgGEpMbxOHjL+fAkMf5+XCLZAjqxodf/edw/vW6uFtWVCdyUzM0dPmoUrr8f5SIMj9v/XYz/k62tJiHHqrXw20yjWGnIANat7YNFwOxbzYHLhRNcEH8cTFuGZBmpVvAjzCvhPqfBjqaATpV/a5AzT++fSNmnz52WXUQEgCUvW0Aj0QZiHvFcfS387E2UYFyOF2CH/D2xfyq6XCrmbE9tNFFtYGLCVfIAnu4tIOUoG9R1BfSX8F0213CgQM+TYkp98wCfKAh9Bh21P+I91ImIyngBIlz2y/HZQS6Bh1cfi9mcdEuBifpuO7ujS5DVF8DvRQ9Ja2v3X+rgpNBMgf1n9Gu3Xc=
host;x-amz-content-sha256;x-amz-date;x-amz-security-token
d65cb4efb2928a5dea3a33c6516fce43304c951887307712013b2d7c7d522690'
The String-to-Sign should have been
'AWS4-HMAC-SHA256
20230228T224146Z
20230228/us-east-1/es/aws4_request
3fe7f7af8f66878ff839d922117d1a0092e5da7454e706170588fcbb06274457'
"}
Debug data from logs:
# String to sign
AWS4-HMAC-SHA256
20230228T224146Z
20230228/us-east-1/es/aws4_request
d662232e30ea37c5863e4df2173672e55822b720eb34450a3fc14b9f09d4e8f4
# REQUEST STRING
PUT
/products_staging_index/_doc/27625530
host:redacted.us-east-1.es.amazonaws.com
x-amz-content-sha256:9ae812aa93eb349823948c6b9612c8e33742c9e77b5c09e17936de952147459b
x-amz-date:20230228T224146Z
x-amz-security-token:IQoJb3JpZ2luX2VjENf//////////wEaCXVzLWVhc3QtMSJHMEUCIQDP4+i2bhcJrsfEneblkk2nctfHgYWZ3GJ0fzfa08n8zgIgUn/tmEbKhLLR0eb36CMC75Q85hxDfnjA5pR/ZoLRlZMqnwUIgP//////////ARACGgwzMTAyNDIyMTY5MzEiDN1z6aMoRwTmvkmdKirzBGw+r5rq3kKURuRWuuyb9G+67Y3nyW6dPn4jWlkVMA9bEmocNS9nhjaMm/p7TW7M5p+l9jl27QXwHeKSjiVCbYNkpKZLqCjdbgLn23d5AXO3hFKbR8k3Idw0pPqCZIstx1d5cwOHpJOGdcJGvljWFqRnw41k24TG75oltuswrkYDeIUlXMFDzrMc2H7PNGFHfPw2BCDAxLppMFT8Nk1s/9er6hOuR9K8hWc5CnmkAMowMgM6JuHuwnA2V0yoqX32NXd/NIJa1jQnzaTRpxYkwpU9FrmsAa4pSkKulSauRKqs9xZ0gDkD4BtTmoyYDhLPuOEKgPtfTnFst/Fglm4EwqORBLhkUX32Do2w4MS6S+j0S3ceuTg8nnMdYhbPo5xWhVwwVLTWpk6dj8lLurQbXw8iFd0el7Fv27pTHKCEOHQLR42rOWvXXn9Q+jtEeuWN5wyryVbW4PfQIMKNI1Vli4aEVfwdawKTcMdVwGkUKFxTMRIlTtyznR43qc7t8/oBrpE2o2UwXx8lQsqxOAEBLI0vH/CIwMUdBAcYeV4zt3K6t9sF2yeki9XWb9cspsBCUUkBYeUkEeiMvYUDKJ332g6Jvl1N1ttHS90nm7C8LQc6tCHXQoCpvTNxDl7JgeKNkofu8swVFY6QH8G6i/6RUrY7W8eBklHdhTad2H5741rv/UN1AtZNgB7atkY1jgdSmSLnfxgGEpMbxOHjL+fAkMf5+XCLZAjqxodf/edw/vW6uFtWVCdyUzM0dPmoUrr8f5SIMj9v/XYz/k62tJiHHqrXw20yjWGnIANat7YNFwOxbzYHLhRNcEH8cTFuGZBmpVvAjzCvhPqfBjqaATpV/a5AzT++fSNmnz52WXUQEgCUvW0Aj0QZiHvFcfS387E2UYFyOF2CH/D2xfyq6XCrmbE9tNFFtYGLCVfIAnu4tIOUoG9R1BfSX8F0213CgQM+TYkp98wCfKAh9Bh21P+I91ImIyngBIlz2y/HZQS6Bh1cfi9mcdEuBifpuO7ujS5DVF8DvRQ9Ja2v3X+rgpNBMgf1n9Gu3Xc=
host;x-amz-content-sha256;x-amz-date;x-amz-security-token
9ae812aa93eb349823948c6b9612c8e33742c9e77b5c09e17936de952147459b
# HEADERS
"host"=>"redacted.us-east-1.es.amazonaws.com",
"x-amz-date"=>"20230228T224146Z",
"x-amz-security-token"=>"IQoJb3JpZ2luX2VjENf//////////wEaCXVzLWVhc3QtMSJHMEUCIQDP4+i2bhcJrsfEneblkk2nctfHgYWZ3GJ0fzfa08n8zgIgUn/tmEbKhLLR0eb36CMC75Q85hxDfnjA5pR/ZoLRlZMqnwUIgP//////////ARACGgwzMTAyNDIyMTY5MzEiDN1z6aMoRwTmvkmdKirzBGw+r5rq3kKURuRWuuyb9G+67Y3nyW6dPn4jWlkVMA9bEmocNS9nhjaMm/p7TW7M5p+l9jl27QXwHeKSjiVCbYNkpKZLqCjdbgLn23d5AXO3hFKbR8k3Idw0pPqCZIstx1d5cwOHpJOGdcJGvljWFqRnw41k24TG75oltuswrkYDeIUlXMFDzrMc2H7PNGFHfPw2BCDAxLppMFT8Nk1s/9er6hOuR9K8hWc5CnmkAMowMgM6JuHuwnA2V0yoqX32NXd/NIJa1jQnzaTRpxYkwpU9FrmsAa4pSkKulSauRKqs9xZ0gDkD4BtTmoyYDhLPuOEKgPtfTnFst/Fglm4EwqORBLhkUX32Do2w4MS6S+j0S3ceuTg8nnMdYhbPo5xWhVwwVLTWpk6dj8lLurQbXw8iFd0el7Fv27pTHKCEOHQLR42rOWvXXn9Q+jtEeuWN5wyryVbW4PfQIMKNI1Vli4aEVfwdawKTcMdVwGkUKFxTMRIlTtyznR43qc7t8/oBrpE2o2UwXx8lQsqxOAEBLI0vH/CIwMUdBAcYeV4zt3K6t9sF2yeki9XWb9cspsBCUUkBYeUkEeiMvYUDKJ332g6Jvl1N1ttHS90nm7C8LQc6tCHXQoCpvTNxDl7JgeKNkofu8swVFY6QH8G6i/6RUrY7W8eBklHdhTad2H5741rv/UN1AtZNgB7atkY1jgdSmSLnfxgGEpMbxOHjL+fAkMf5+XCLZAjqxodf/edw/vW6uFtWVCdyUzM0dPmoUrr8f5SIMj9v/XYz/k62tJiHHqrXw20yjWGnIANat7YNFwOxbzYHLhRNcEH8cTFuGZBmpVvAjzCvhPqfBjqaATpV/a5AzT++fSNmnz52WXUQEgCUvW0Aj0QZiHvFcfS387E2UYFyOF2CH/D2xfyq6XCrmbE9tNFFtYGLCVfIAnu4tIOUoG9R1BfSX8F0213CgQM+TYkp98wCfKAh9Bh21P+I91ImIyngBIlz2y/HZQS6Bh1cfi9mcdEuBifpuO7ujS5DVF8DvRQ9Ja2v3X+rgpNBMgf1n9Gu3Xc=",
"x-amz-content-sha256"=>"9ae812aa93eb349823948c6b9612c8e33742c9e77b5c09e17936de952147459b",
"authorization"=>"AWS4-HMAC-SHA256 Credential=ASIAUQO7ARPR4NNBUQ6O/20230228/us-east-1/es/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=f346ebdbf05b1e251fe554b95c93f61b3ea04a5a78ad16ae034fb8eb2f6fee28"
Everything looks good except for the security tokens not matching up 🤔 is this at all helpful is diagnosing what is going wrong?
If I have to guess, your security token is rotated/refreshed periodically, and you have a long running session that uses the same sigv4 client instance for all requests to OpenSearch. Eventually the cred will expire and you start getting 403.
To test this theory, write a script that makes a request every 10 seconds. Catch any 403 error, and see if after awhile, all requests start returning 403.
We do initialize this client on app boot, do we need to re-initialize it periodically (or on failure)? That would make sense
IF my theory is correct then yes, re-instantiate the client or the signer (The signer
is an accessible attribute of the client: client.signer = new_signer
) once the cred's expired will solve it. How to implement it is up to you. See if the Ruby AWS SDK has the capability to refresh the cred on its own or at least tell you when it's about to expire. :-)
We are using Aws::AssumeRoleWebIdentityCredentials
which will allegedly auto refresh. I wonder if there is some scenario under which it doesn't. Can you think of any reason this credential class wouldn't work, or we should be using a different one?
Haven't used Aws::AssumeRoleWebIdentityCredentials
myself so I don't know the answer but that is strange. Have you tested that theory yet?
I'm also running into this. One case I noticed where this always happens is when sending a create index request that contains the char_filter mapping with =>
(the actual mapping doesn't matter).
I also noticed that if the index name had %25
the debug message complained about %252525
. Something may be wrong with the escaping?
Thanks for reaching out. Would you mind turning on Debug mode and paste the logs here?
I might have found the solution actually... the following patch seems to work for me:
OpenSearch::Aws::Sigv4Client.class_exec do
def perform_request(method, path, params = {}, body = nil, headers = nil)
signature_body = body.is_a?(Hash) ? body.to_json : body.to_s
signature = sigv4_signer.sign_request(
http_method: method,
url: signature_url(path, params),
headers: headers,
body: signature_body
)
headers = (headers || {}).merge(signature.headers)
log_signature_info(signature)
# Patch to use signature_body instead of body on the following line:
super(method, path, params, signature_body, headers)
end
end
I'm still testing it though
@Dantemss nice find and that's really interesting. I wonder if perform_request
alters the body somehow. Lemme do some digging.
If you have any observability integrations that instrument the various HTTP clients or AWS SDKs, you may want to check on those. In particular, we've found that OpenTelemetry's Faraday middleware instrumentation can interfere with the signatures
@Dantemss Looks like the OS Ruby gem uses Multi-JSON to serialize the body by default, while the signature is generated with a body serialized with the native JSON gem. This might be the cause of the mismatching signature in the body if what you described is correct (i.e. it happens consistently with certain characters). Lemme know if you run into any issues using the monkey patch workaround you mentioned above. Feel free to make a PR into the Sigv4 Gem repo.
Not the same issue that @phoffer was dealing with tho (where the mismatching happened in the creds). What @maxfierke mentioned above is also worth looking into if you're dealing with random sigv4 errors.
The monkeypatch seems to have fixed the signature issues that we had. I'll make a PR.
This begs the question: Should we replace MultiJSON with JSON as the default serializer?
I think it's up to y'all but whichever one you use, I'd say it would be better to be consistent and use the same one everywhere.
@nhtruong yes, multi-json is just a shim in front of various serializers, and I suggest we remove it - Rails did it in 2013, https://github.com/rails/rails/pull/10576, or we could do what we did in Grape, ie. users can require multi_json explicitly in which case it will be used - https://github.com/ruby-grape/grape/pull/1623
@Dantemss want to try to PR ^ ?
Hi @phoffer
I am facing the same issue, any workaround that can help here ? Or how did you fix it?
@praveen-ks
Looks like https://github.com/opensearch-project/opensearch-ruby-aws-sigv4/pull/24 fixed the bug as reported, and was released in opensearch-ruby-aws-sigv4 v1.2.1. I am going to close this issue to avoid confusion, please make sure your Gemfile references that or a newer version? Let us know if you still see the problem after that.
@dblock I am using v1.2.1 only but still facing the issue.
@praveen-ks See above for turning on debugging, open a new one with details since it's likely not the same problem.
https://github.com/opensearch-project/opensearch-ruby/issues/141#issuecomment-1447342826
problem
@dblock Thanks for the next steps.
But I can't spend time on this currently, I am continuing with faraday_middleware-aws-sigv4
gem as I was using with elasticsearch-ruby
gem.
Are there any concerns with using faraday_middleware-aws-sigv4 with opensearch-ruby ?
Are there any concerns with using faraday_middleware-aws-sigv4 with opensearch-ruby ?
Not that I know of.
What is the bug?
We are using the
OpenSearch::Aws::Sigv4Client
according to instructions, but we have signature errors when trying to update documents. We are using IAM users.We have two indices in Opensearch and 80% PUT calls are success for one index (user data), but the other index (product info) fails at 100% when we try to update documents. We can successfully import all the data correctly to begin with.
We are switching over from Elasticsearch, and have a lot built out using the previous Elasticsearch gems that these were continued from.
How can one reproduce the bug?
This is how we are setting up the Opensearch client:
We have also tried using access/secret like this:
What is the expected behavior?
A clear and concise description of what you expected to happen. We expect 100% of updates to be successful
What is your host/environment?
Kubernetes EKS. 1.21
We are using the current HEAD for this repo in our Gemfile (both OS-ruby and sigv4) to include the recent fixes.
Do you have any screenshots?
If applicable, add screenshots to help explain your problem. Log example:
Do you have any additional context?
Add any other context about the problem. It seems similar to this issue https://github.com/amzn/selling-partner-api-models/issues/774