Closed dlvenable closed 1 year ago
I debugged a similar problem. Can't share logs, but I wrote a quick script to process logstash logs: https://gist.github.com/dblock/0454d44d45bae82837b9c444b5bb0580
I'm seeing that a single connection that started failing, but it did recover for some subsequent requests.
http-outgoing-18: ........., total=9, success=9, failure=0
http-outgoing-15: ........, total=8, success=8, failure=0
http-outgoing-17: ......, total=6, success=6, failure=0
http-outgoing-14: ........., total=9, success=9, failure=0
http-outgoing-7: ., total=1, success=1, failure=0
http-outgoing-12: ........., total=9, success=9, failure=0
http-outgoing-16: xxx..xxxxx.., total=12, success=4, failure=8
http-outgoing-8: ...., total=4, success=4, failure=0
http-outgoing-13: ..., total=3, success=3, failure=0
http-outgoing-6: ......., total=7, success=7, failure=0
http-outgoing-9: ......., total=7, success=7, failure=0
http-outgoing-3: ....., total=5, success=5, failure=0
http-outgoing-4: ......, total=6, success=6, failure=0
http-outgoing-5: ......, total=6, success=6, failure=0
http-outgoing-10: ......, total=6, success=6, failure=0
http-outgoing-11: .., total=2, success=2, failure=0
Looking at that one connection, there's something interesting with the content-lengths.
2023-06-29T20:25:08-04:00 [2023-06-30T00:25:08,806][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 141208[\r][\n]"
2023-06-29T20:25:08-04:00 [2023-06-30T00:25:08,848][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 263738[\r][\n]"
2023-06-29T20:25:09-04:00 [2023-06-30T00:25:09,023][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 219374[\r][\n]"
2023-06-29T20:25:09-04:00 [2023-06-30T00:25:09,148][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 4901[\r][\n]"
2023-06-29T20:25:09-04:00 [2023-06-30T00:25:09,239][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 23419[\r][\n]"
2023-06-29T20:25:10-04:00 [2023-06-30T00:25:10,949][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 175362[\r][\n]"
2023-06-29T20:25:11-04:00 [2023-06-30T00:25:11,200][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 171756[\r][\n]"
2023-06-29T20:25:11-04:00 [2023-06-30T00:25:11,217][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 171912[\r][\n]"
2023-06-29T20:25:11-04:00 [2023-06-30T00:25:11,333][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 191522[\r][\n]"
2023-06-29T20:25:11-04:00 [2023-06-30T00:25:11,699][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 165799[\r][\n]"
2023-06-29T20:25:12-04:00 [2023-06-30T00:25:12,148][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 3809[\r][\n]"
2023-06-29T20:25:12-04:00 [2023-06-30T00:25:12,402][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-16 >> "Content-Length: 3674[\r][\n]"
This correlates really well, larger payloads fail and smaller payloads succeed, even on the same connection.
I do have plenty of other connections that succeed with larger payloads on a different host, so this isn't a size problem.
2023-06-29T20:02:47-04:00 [2023-06-30T00:02:47,304][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-43 >> "Content-Length: 174475[\r][\n]"
2023-06-29T20:02:47-04:00 [2023-06-30T00:02:47,367][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-36 >> "Content-Length: 183893[\r][\n]"
2023-06-29T20:02:47-04:00 [2023-06-30T00:02:47,535][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-37 >> "Content-Length: 173939[\r][\n]"
2023-06-29T20:02:47-04:00 [2023-06-30T00:02:47,556][DEBUG][org.apache.http.wire ][output_oss][oss_output_cluster] http-outgoing-43 >> "Content-Length: 177570[\r][\n]"
There's an interesting bug found in the opensearch-ruby that could be a root cause of behavior like this: https://github.com/opensearch-project/opensearch-ruby/issues/141
I got it. The culprit is something like 'boîte de réception'.encode("ISO-8859-1")
.
#!/usr/bin/env ruby
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '../lib'))
require 'bundler/setup'
require 'logstash-core'
require 'logstash/plugin_mixins/ecs_compatibility_support'
require 'logstash/outputs/opensearch'
require 'logstash/codecs/plain'
require 'logstash/instrument/namespaced_null_metric'
require 'cabin'
java_import 'java.util.concurrent.Callable'
java_import 'java.util.concurrent.FutureTask'
java_import 'java.util.concurrent.LinkedBlockingQueue'
java_import 'java.util.concurrent.ThreadPoolExecutor'
java_import 'java.util.concurrent.TimeUnit'
puts "Running on #{RUBY_PLATFORM} ..."
host = 'https://....es.amazonaws.com:443'
auth = {
"type" => 'aws_iam',
"aws_access_key_id" => ENV['AWS_ACCESS_KEY_ID'],
"aws_secret_access_key" => ENV['AWS_SECRET_ACCESS_KEY'],
"session_token" => ENV['AWS_SESSION_TOKEN'],
"region" => ENV['AWS_REGION']
}
index_name = "my-index"
settings = {
"hosts" => host,
"index" => index_name,
"auth_type" => auth
}
config = LogStash::Outputs::OpenSearch.new(settings)
config.register
string = 'boîte de réception'.encode("ISO-8859-1")
events = [
LogStash::Event.new("message" => string)
]
config.multi_receive(events)
causes
[[{"index"=>{:_id=>nil, :_index=>"my-index", :routing=>nil}}, {"message"=>"bo\xEEte de r\xE9ception", "@version"=>"1", "@timestamp"=>#<LogStash::Timestamp:0x28bc9749>}]]
[ERROR] 2023-08-22 17:20:09.330 [main] opensearch - Encountered a retryable error (will retry with exponential backoff) {:code=>403, :url=>".../_bulk", :content_length=>146, :body=>"{\"message\":\"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\\n\\nThe Canonical String for this request should have been\\n'POST\\n/_bulk\\n\\ncontent-type:application/json\\nhost:...\\nx-amz-content-sha256:9d7ad54b413cb0adf982b2b2ea28750f366d4440f90ca63e565ec4c18d0ebfb1\\nx-amz-date:20230822T172009Z\\nx-amz-security-token:IQoJb3JpZ2luX2VjEDkaCXVzLWVhc3QtMSJHMEUCIQC2dwXmqTklolL66V65gxR8c+vVjbs35hMwQ8QXqPJyYQIgRaUaBFMi/Dtmw7CBMyIXSd1dGOf1cbRJW+RBPjGlSocqpgII8f//////////ARAEGgwwMTMzMDYyNDI2NTEiDCrKXd2eLwyHbDEfuyr6AQwW9iMzEhXoc1Y5XSUBrnOz2p0z1n8QqQaHbcLebSNecTR8Qdb/NW+XyOYhpkybyjDekjyQtmxfYrjXWrgKYzlsrOYw3cGjYR65zF237BLoE0RtgzPTa8k6MvtYOeHRiiJYfVCP6mNcwC07rBqC/Kkbpr5UUa24gfX4bwnNZVzIGkTkBXacXdiZAvDjF6fafeOyCjQHukteYECkGu6IZ2S8zNo9WRHAPZTsufE0Q7Q5LnZ4aFK6xEm7T33sTgmW6OM2fb3HTaT/+/wv4FB03g8rURIVp1OglNMJImH9h9wfQII9fgUQr/O42dpTS8Va/8qDzOx0F0RZ+FwwhrmTpwY6nQFXLXrKLZuqBjIcBEIIjaEsqogE1ojjWM1mo98x/LmVb3LvuPnzJ6y/A75ETs82r0dh+DC0KAWmG5KARKv50fV1uR7sKCNl92bJmJtwPo5dmES19RCEld2WGm8IZyb5TXrWe3WZt4iUPeW6bfg1IRTOyI3kXEFWhca854WLUfgdT42bpXRYnoAmyWuNRVcO+TKsDeALmgM+VoDEuxXY\\n\\ncontent-type;host;x-amz-content-sha256;x-amz-date;x-amz-security-token\\nf4b76bd989acd2cf009c9e1db13936cd9a42d0171059e64f68aabea314ce750f'\\n\\nThe String-to-Sign should have been\\n'AWS4-HMAC-SHA256\\n20230822T172009Z\\n20230822/us-west-2/es/aws4_request\\na7c6b12e3b00c802279f11ab89e70a3a23e035be29836c9b570f96692adb5d6c'\\n\"}"}
are we looking to backport this to prior versions as it seems to be found in multiple?
are we looking to backport this to prior versions as it seems to be found in multiple?
We could if it's necessary, @dlvenable what versions do we still maintain?
Describe the bug
In a setup where Logstash is configured to write to an Amazon OpenSearch Service domain. This plugin is using SigV4 signing in this setup.
When a large volume of data is being processed, the plugin will sometimes get an error indicating the request signature does not match the expected signature.