splunk / fluent-plugin-splunk-hec

This is the Fluentd output plugin for sending events to Splunk via HEC.
Apache License 2.0
84 stars 90 forks source link

Docs example for ciphers array option #107

Open matthewmodestino opened 4 years ago

matthewmodestino commented 4 years ago

What would you like to be added:

Docs example for ciphers array option

https://github.com/splunk/fluent-plugin-splunk-hec#ciphers-array

Why is this needed:

I am deploying SCK 1.3.0 charts and experimenting with security settings by hardening the sslVersions in server.conf & with HEC inputs.conf. They both have options to set sslVersions and ciphers.

https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf#SSL_Configuration_details https://docs.splunk.com/Documentation/Splunk/8.0.1/Admin/Inputsconf#http:_.28HTTP_Event_Collector.29

sslVersions = <versions_list>
* Comma-separated list of SSL versions to support for incoming connections.
* The versions available are "ssl3", "tls1.0", "tls1.1", and "tls1.2".
* The special version "*" selects all supported versions.
  The version "tls" selects all versions tls1.0 or newer.
* If a version is prefixed with "-" it is removed from the list.
* SSLv2 is always disabled; "-ssl2" is accepted in the version
  list but does nothing.
* When configured in FIPS mode, "ssl3" is always disabled regardless
  of this configuration.
* Default: The default can vary (see the 'sslVersions' setting in
  the $SPLUNK_HOME/etc/system/default/server.conf file for the
  current default)

I tried setting specific versions in server.conf but no dice. I didnt set them explicitly in inputs.conf, so I am thinking server.conf applied. I tried setting tls and thats when the connection fails/retried happened.

I am now seeing this from some of my fluentd pods:

2020-02-16 17:37:50 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2020-02-16 17:37:51 +0000 chunk="59eb4e57fdaa2189656a4acd3c8195e9" error_class=Net::HTTP::Persistent::Error error="too many connection resets (due to SSL_connect returned=1 errno=0 state=SSLv3/TLS write client hello: wrong version number - OpenSSL::SSL::SSLError) after 0 requests on 70019652579240, last used 1581874670.2335458 seconds ago"
  2020-02-16 17:37:50 +0000 [warn]: #0 suppressed same stacktrace
2020-02-16 17:37:51 +0000 [warn]: #0 retry succeeded. chunk_id="59eb4e57fdaa2189656a4acd3c8195e9"

The initial connection fails on sslv3 then succeeds on retry.

matthewmodestino commented 4 years ago

i think it's SSL version, that I am after...

/opt/splunk/etc/system/default/server.conf                             sslVersions = tls1.2

I want the output from the pods to be tls1.2, not v3.

matthewmodestino commented 4 years ago

I ended up getting it to calm down by removing custom sslVersions settings in server.conf and inputs.conf on the Splunk side (Splunk default is "anything but sslv2") , but we should probably get the tls config syncd up between splunkd and fluentd-hec, right?

last error seen ~10 min ago. post change data back to flowing normally. While the handshake issues were happening i was seeing data delays.

2020-02-17 00:55:50 +0000 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2020-02-17 00:55:51 +0000 chunk="59ebb03bdb6c2edbdb173a725cd4b414" error_class=OpenSSL::SSL::SSLError error="SSL_connect returned=1 errno=0 state=SSLv3/TLS write client hello: wrong version number"
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/net/protocol.rb:44:in `connect_nonblock'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/net/protocol.rb:44:in `ssl_socket_connect'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/net/http.rb:996:in `connect'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/net/http.rb:930:in `do_start'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/net/http.rb:925:in `start'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:724:in `start'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:653:in `connection_for'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:958:in `request'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.2.0/lib/fluent/plugin/out_splunk_hec.rb:292:in `write_to_splunk'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.2.0/lib/fluent/plugin/out_splunk.rb:97:in `block in write'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/lib/ruby/2.6.0/benchmark.rb:308:in `realtime'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.2.0/lib/fluent/plugin/out_splunk.rb:96:in `write'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.7.3/lib/fluent/compat/output.rb:131:in `write'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.7.3/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.7.3/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.7.3/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2020-02-17 00:55:50 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.7.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-02-17 00:55:51 +0000 [warn]: #0 failed to flush the buffer. retry_time=3 next_retry_seconds=2020-02-17 00:55:55 +0000 chunk="59ebb03bdb6c2edbdb173a725cd4b414" error_class=OpenSSL::SSL::SSLError error="SSL_connect returned=1 errno=0 state=SSLv3/TLS write client hello: wrong version number"
  2020-02-17 00:55:51 +0000 [warn]: #0 suppressed same stacktrace
2020-02-17 00:55:55 +0000 [warn]: #0 retry succeeded. chunk_id="59ebb03bdb6c2edbdb173a725cd4b414"
2020-02-17 00:56:12 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:56:20 +0000 [info]: #0 [containers.log] detected rotation of /var/log/containers/splunk-audit-agent-standalone-0_splunk_splunk-74e4365ebfb30270d7d84eba20327ea988c3fe4219812d3773899f3e9f0f6e03.log; waiting 5 seconds
2020-02-17 00:56:43 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:56:53 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:57:03 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:57:13 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:57:23 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-rsmkb_ingress-nginx_nginx-ingress-controller-816e4a429bc104956f060a1133e4a30a1060eb944faa9af24bab33c054e3c9e3.log:stdout
2020-02-17 00:57:39 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.nginx-ingress-controller-9dfc54f55-r

notice the stability in dashboard data. annotations on graph show pod in crashloop till i came back home and fixed the splunk config.

image