Open dstoy53 opened 8 years ago
It is the first time to hear about such behavior... I have no idea about root cause for now. Could you paste your configuration on "fluentd01" node? (Of course, without your secret values)
I've attached the sanitized config for fluentd01. For fluentd01->splunkfwd01 the traffic is using the default 24284 port, and for fluentd01->efk01 I hard-set both sides to 24285 as a troubleshooting attempt. FirewallD is running on all VMs with the appropriate ports opened and listening. All of the VMs are running CentOS7 with selinux set to enforcing. All VMs are on the same subnet in a test environment, and I'm using the same shared secret and passphrase throughout the environment (the shared secret itself is a different value than the passphrase however).
Ultimately with this exact config, the relevant logs are received via in_secure at fluentd01, then stored locally and forwarded successfully to splunkfwd01 - but they are not being forwarded to efk01 due to the error I posted from the logs. If I comment out the store statement related to splunkfwd01, then my logs have no issues reaching efk01.
Thank you for looking into this.
Hmm, your configuration looks correct to work well - no explicit bad points at least. OK, I'll investigate about it at the next time I get enough time for it. It might require some time... sorry for it.
I did some more testing and removed out_copy from the equation. Now I'm just using 2x match statements.
Scenario 1: fluentd01 -> splunkfwd01 - logs are forwarded, no errors Scenario 2: fluentd01 -> efk01 - logs are forwarded, no errors Scenario 3: fluentd01 -> splunkfwd01 AND efk01 - certificate verification for the second match statement fails (efk01 in this case) Scenario 4: fluentd01 -> splunkfwd01 AND efk01 while using the certificate from efk01 on all 3 servers - everything works with no errors Scenario 5: Same parameters as Scenario 4, except for the second match statement I set the 'ca_cert_path' to fluentd01's own cert file which is not present on either splunkfwd01 or efk01 - everything works fine with no errors
Based on these results the problem is occurring because I'm trying to use a different certificate for each connection. It seems that the plugin uses the same certificate from the first match statement on any subsequent match statements (at least that's what Scenario 5 leads me to believe).
So if I use the same certificate throughout my environment I would have no issues forwarding the traffic, at the expense of security.
I tested your situation with the configurations cert_copy_client
, cert_copy_server_a
and cert_copy_server_b
in the branch of pull-request below:
https://github.com/tagomoris/fluent-plugin-secure-forward/pull/45
As the result, 2 different CA certs in 2 <store>
sections of a copy plugins, works well in my environment.
2016-07-29 13:36:24 +0900 [info]: using configuration file: <ROOT>
<source>
@type forward
</source>
<match test.**>
@type copy
<store>
@type "secure_forward"
secure yes
self_hostname "client"
shared_key xxxxxx
ca_cert_path "/Users/tagomoris/github/fluent-plugin-secure-forward/example/cacerts1/ca_cert.pem"
enable_strict_verification yes
flush_interval 1s
<server>
host "localhost"
port 24284
hostlabel "server_a.local"
</server>
<buffer tag>
flush_mode interval
retry_type exponential_backoff
flush_interval 1s
</buffer>
</store>
<store>
@type "secure_forward"
secure yes
self_hostname "client"
shared_key xxxxxx
ca_cert_path "/Users/tagomoris/github/fluent-plugin-secure-forward/example/cacerts2/ca_cert.pem"
enable_strict_verification yes
flush_interval 1s
<server>
host "localhost"
port 24285
hostlabel "server_a.local"
</server>
<buffer tag>
flush_mode interval
retry_type exponential_backoff
flush_interval 1s
</buffer>
</store>
</match>
</ROOT>
2016-07-29 13:36:24 +0900 [info]: listening fluent socket on 0.0.0.0:24224
2016-07-29 13:36:24 +0900 [info]: connection established to localhost
2016-07-29 13:36:24 +0900 [info]: connection established to localhost
@l-53 Can you try these configurations and CA cert files in your environment?
No luck with the new certificates either.
Here are my current destinations from fluentd01:
So far it seems that every time I reload td-agent, the ssl session is established to either influxdb01 or efk01, but only one server at a time while the other one fails with the same certificate verification error.
If I comment out the config for influxdb01 I have no issues connecting to efk01. If I comment out the config for efk01 I have no issues connecting to influxdb01.
This tells me I didn't typo/mismatch the cert/psk/passphrases anywhere. The only notable differences I can spot between your test and my environment is that you tested on the same host rather than across servers, and your config uses strict verification.
I've attached my relevant sanitized config.
efk01_sanitized.txt fluentd01_sanitized.txt influxdb01_sanitized.txt
@l-53 Could you paste your logs of fluentd01? (debug(-v
) or trace(-vv
) logs if possible)
I've attached the sanitized logs with -vv. The "SSLErrorWaitReadable" error for the successfully established connection only shows up with -vv.
@l-53 Hmm, it looks just normal (and it looks to report certificate error).
I'm very sorry to bother you, could you upload 2 more logs for commented-out pattern for 1st and 2nd <store>
sections?
I've attached the logs from fluentd01 with one section commented out at a time.
Super weird... Could you tell me your versions of Ruby, td-agent (if you are using it) and OpenSSL on fluentd01?
fluentd01's version-manifest.txt: ruby: 2.1.8 (embedded, no ruby installed on the system itself) td-agent: 2.3.1 openssl: 1.0.1r
influxdb01's version-manifest.txt: ruby: 2.1.10 (embedded, no ruby installed on the system itself) td-agent: 2.3.2 openssl: 1.0.1t
efk01's version-manifest.txt: ruby: 2.1.10 (embedded, no ruby installed on the system itself) td-agent: 2.3.2 openssl: 1.0.1t
I agree it's strange, that's why I initially thought I must've been doing something wrong along the way.
I couldn't find any previous posts about the issue I'm experiencing, so I'm hoping to find out if it's a bug or a PEBKAC situation.
I'm attempting to securely forward logs in the following manner:
The issue that I'm running into is that if I have both of the out_secure destinations in the config file at step 3, only the first one is able to establish the SSL connection. The second one errors out by failing the SSL verification.
If I comment out one of the 2 out_secure destinations and forward the logs only to one destination at a time (either "splunkfwd01" or "efk01") the logs are forwarded successfully. This tells me my certs/shared secret/passphrase are accurate for either combination of fluentd01->splunkfwd01 or fluentd01->efk01. I am using a separate cert/key pair for each connection.
2016-07-20 15:44:12 -0400 [warn]: failed to establish SSL connection error_class=OpenSSL::SSL::SSLError error=#<OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed> host="10.10.10.54" address="10.10.10.54" port=24285
(I tried using a different port for "efk01", 24285, as a troubleshooting step)
To sum it up, the behavior I'm seeing is that for a given out_copy match I can only use one out_secure store at a time.
Am I just missing something blatantly obvious?
Edit: "fluentd01" is running fluentd 0.12.20, while "splunkfwd01 and "efk01" are running fluentd 0.12.26. All servers are running 'fluent-plugin-secure-forward' version '0.4.2'. Edit2: The certificates were generated using 'secure-forward-ca-generate' from this plugin. They work fine with a single connection from either fluentd01->splunkfwd01, or for a single connection from fluentd01->efk01. The issue only occurs when I try to forward traffic to both splunkfwd01 and efk01 under the same match statement with out_copy.