splunk / splunk-connect-for-kubernetes

Helm charts associated with kubernetes plug-ins
Apache License 2.0
344 stars 270 forks source link

Unexpected error raised. Stopping the timer in splunk-kubernetes-metrics #424

Closed rchenzheng closed 4 years ago

rchenzheng commented 4 years ago

What happened:

splunk-kubernetes-metrics crashes

2020-07-16 15:31:06 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)"
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-16 15:31:06 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-16 15:31:06 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-16 15:31:06 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper

What you expected to happen:

Generate metrics to splunk cloud

How to reproduce it (as minimally and precisely as possible):

Deploy using default settings

global:
  logLevel: info
  splunk:
    hec:
      indexName: k8s-logs
      port: 443
      protocol: https
      insecureSSL: false

Anything else we need to know?:

Environment:

rchenzheng commented 4 years ago

Related to https://github.com/splunk/splunk-connect-for-kubernetes/issues/417

mwang2016 commented 4 years ago

@rchenzheng Did you try setting insecureSSL for metrics charts? If you are using Managed Splunk Cloud, you also need to set up HEC properly. Please refer to documentation on this page: UsetheHTTPEventCollector

rchenzheng commented 4 years ago

I didn't previously but now I get

│ 2020-07-16 20:46:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:46:32 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:48:17 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:48:17 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:48:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:48:32 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:49:17 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:49:17 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:49:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:49:32 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:50:17 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:50:17 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:50:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:50:32 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:51:17 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:51:17 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│ 2020-07-16 20:51:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:51:32 +0000 [error]: #0 suppressed same stacktrace                                                                                                                                       │
│
rchenzheng commented 4 years ago

That was on the agregator vm but still getting the same @mwang2016

│ 2020-07-16 21:02:31 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno= │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'                                                                     │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'                                                                               │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'                                                                                │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'                                                                                 │
│   2020-07-16 21:02:31 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'                                                                                       │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'                                                                            │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'                                                                                          │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'                                                                                               │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'                                                                 │
│   2020-07-16 21:02:31 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'                                                             │
│ 2020-07-16 21:02:31 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper
mwang2016 commented 4 years ago

@rchenzheng can you show your config for the metrics?

rchenzheng commented 4 years ago

@rchenzheng can you show your config for the metrics?

After more research, we're using ssl/https with Splunk cloud and the config is as follows:

global:
  logLevel: info
  splunk:
    hec:
      indexName: k8s-logs
      host: my-host.splunkcloud.com
      port: 443
      token: HEIC-TOKEN
      protocol: https
      insecureSSL: false

splunk-kubernetes-metrics:
  enabled: true

I've tried both insecureSSL true and false, they still have the same issue

mwang2016 commented 4 years ago

@rchenzheng can you try this config?

splunk-kubernetes-metrics:
  kubernetes:
    insecureSSL: false  
  splunk:
    hec:
      indexName: k8s-logs
      host: my-host.splunkcloud.com
      port: 443
      token: HEIC-TOKEN
      protocol: https
      insecureSSL: false
rchenzheng commented 4 years ago

@rchenzheng can you try this config?

splunk-kubernetes-metrics:
  kubernetes:
    insecureSSL: false  
  splunk:
    hec:
      indexName: k8s-logs
      host: my-host.splunkcloud.com
      port: 443
      token: HEIC-TOKEN
      protocol: https
      insecureSSL: false

Same error, no changes to my config either

mwang2016 commented 4 years ago

@rchenzheng Sorry the insecureSSL should be true and if you are using Managed Splunk Cloud your host should have prefix like https://http-inputs-my-host.splunkcloud.com:443

Send data to HTTP Event Collector on Splunk Cloud instances
Depending on the type of Splunk Cloud that you use, you must send data using a specific URI for HEC.

The standard form for the HEC URI in self-service Splunk Cloud is as follows:
<protocol>://input-<host>:<port>/<endpoint>

The standard form for the HEC URI in managed Splunk Cloud is as follows:
<protocol>://http-inputs-<host>:<port>/<endpoint>
rchenzheng commented 4 years ago

@rchenzheng Sorry the insecureSSL should be true and if you are using Managed Splunk Cloud your host should have prefix like https://http-inputs-my-host.splunkcloud.com:443

Send data to HTTP Event Collector on Splunk Cloud instances
Depending on the type of Splunk Cloud that you use, you must send data using a specific URI for HEC.

The standard form for the HEC URI in self-service Splunk Cloud is as follows:
<protocol>://input-<host>:<port>/<endpoint>

The standard form for the HEC URI in managed Splunk Cloud is as follows:
<protocol>://http-inputs-<host>:<port>/<endpoint>

Even with it off or insecureSSL: true I still have the same issue.

Wouldn't this allow for man-in-the-middle attacks since insecureSSL is for self-signed certificates?

│ 2020-07-17 20:17:21 +0000 [error]: #0 Timer detached. title=:metric_scraper                                                                                                                              │
│ 2020-07-17 20:17:21 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 s │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'                                                                     │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'                                                                                │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'                                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:647:in `scrape_stats_metrics'                                                                                          │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'                                                                            │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'                                                                                          │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'                                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'                                                             │
│ 2020-07-17 20:17:21 +0000 [error]: #0 Timer detached. title=:stats_metric_scraper                                                                                                                        │
│ 2020-07-17 20:17:21 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno= │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'                                                                     │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'                                                                                │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'                                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'                                                                                       │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'                                                                            │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'                                                                                          │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'                                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'                                                             │
│ 2020-07-17 20:17:21 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper
mwang2016 commented 4 years ago

@rchenzheng did you add the prefix http-inputs-in your host?

rockb1017 commented 4 years ago
│ 2020-07-17 20:17:21 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 s │

this log seems to say it cannot get metrics from k8s due to ssl issue. if you have kubernetes.insecureSSL to true, then it will not have error.

splunk-kubernetes-metrics:
  kubernetes:
    insecureSSL: true  

but are you still having this error?

│ 2020-07-16 20:46:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:46:32 +0000 [error]: #0 suppressed same stacktrace 

and could you post here pod logs from the beginning till the error you are getting? (with sensitive info removed)

rchenzheng commented 4 years ago

@rchenzheng did you add the prefix http-inputs-in your host?

Yes this was added

│ 2020-07-17 20:17:21 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 s │

this log seems to say it cannot get metrics from k8s due to ssl issue. if you have kubernetes.insecureSSL to true, then it will not have error.

splunk-kubernetes-metrics:
  kubernetes:
    insecureSSL: true  

but are you still having this error?

│ 2020-07-16 20:46:32 +0000 [error]: #0 Failed to scrape resource usage metrics, error=, #<NoMethodError: undefined method `[]' for nil:NilClass>                                                          │
│   2020-07-16 20:46:32 +0000 [error]: #0 suppressed same stacktrace 

and could you post here pod logs from the beginning till the error you are getting? (with sensitive info removed)

I set that to true yet the error is still there, but the case is that my input isn't a self-signed certificate and this would allow a MITM attack

│     @type splunk_hec                                                                                                                                                                                     │
│     data_type metric                                                                                                                                                                                     │
│     metric_name_key "metric_name"                                                                                                                                                                        │
│     metric_value_key "value"                                                                                                                                                                             │
│     protocol https                                                                                                                                                                                       │
│     hec_host "http-inputs-MYINPUT.splunkcloud.com"                                                                                                                                                      │
│     hec_port 443                                                                                                                                                                                         │
│     hec_token "MY-TOKEN"                                                                                                                                                     │
│     host "THE-HOST"                                                                                                                                        │
│     index "MY-INDEX"                                                                                                                                                                                  │
│     source "${tag}"                                                                                                                                                                                      │
│     insecure_ssl true
rockb1017 commented 4 years ago

so which error are you getting?

rockb1017 commented 4 years ago

could you specify what you mean by "my input isn't a self-signed certificate"?

rchenzheng commented 4 years ago

could you specify what you mean by "my input isn't a self-signed certificate"?

The ssl certificate for http-inputs-MYINPUT.splunkcloud.com is a wildcard certificate for *.splunkcloud.com

I believe the issue here is the container which may or may not trust the ssl certificate, so it's whatever you guys bundled the image with.

rchenzheng commented 4 years ago

so which error are you getting?

│ 2020-07-17 20:17:21 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno= │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'                                                                     │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'                                                                                │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'                                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'                                                                                       │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'                                                                            │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'                                                                                          │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'                                                                                               │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'                                                                 │
│   2020-07-17 20:17:21 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'                                                             │
│ 2020-07-17 20:17:21 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper
puzzlepri commented 4 years ago

Hello All,

I have similar issue, metric pods are failing, is there solution for this?

@type kubernetes_metrics tag "kube.*" node_name "ocp.xyz.com" use_rest_client_ssl true cluster_name xyz interval 15s is not used. 2020-06-09 13:36:25 +0000 [info]: #0 starting fluentd worker pid=18 ppid=8 worker=0 2020-06-09 13:36:25 +0000 [debug]: #0 buffer started instance=47143694559200 stage_size=0 queue_size=0 2020-06-09 13:36:25 +0000 [info]: #0 fluentd worker is now running worker=0 2020-06-09 13:36:26 +0000 [debug]: #0 flush_thread actually running 2020-06-09 13:36:26 +0000 [debug]: #0 enqueue_thread actually running 2020-06-09 13:36:40 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (unable to get local issuer certificate)" 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in rescue in transmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:651:intransmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in execute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:63:inexecute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes-metrics-1.1.2/lib/fluent/plugin/in_kubernetes_metrics.rb:635:in scrape_metrics' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/timer.rb:80:inon_timer' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:in run_once' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:inrun' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/event_loop.rb:93:in block in start' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create' 2020-06-09 13:36:40 +0000 [error]: #0 Timer detached. title=:metric_scraper 2020-06-09 13:36:40 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (unable to get local issuer certificate)" 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in rescue in transmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:651:intransmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in execute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:63:inexecute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes-metrics-1.1.2/lib/fluent/plugin/in_kubernetes_metrics.rb:647:in scrape_stats_metrics' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/timer.rb:80:inon_timer' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:in run_once' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:inrun' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/event_loop.rb:93:in block in start' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create' 2020-06-09 13:36:40 +0000 [error]: #0 Timer detached. title=:stats_metric_scraper 2020-06-09 13:36:40 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (unable to get local issuer certificate)" 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in rescue in transmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:651:intransmit' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in execute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/rest-client-2.1.0/lib/restclient/request.rb:63:inexecute' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes-metrics-1.1.2/lib/fluent/plugin/in_kubernetes_metrics.rb:660:in scrape_cadvisor_metrics' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/timer.rb:80:inon_timer' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:in run_once' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/cool.io-1.5.4/lib/cool.io/loop.rb:88:inrun' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/event_loop.rb:93:in block in start' 2020-06-09 13:36:40 +0000 [error]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create' 2020-06-09 13:36:40 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper

rockb1017 commented 4 years ago

Could you try

kubernetes
    # This option is used to get the metrics from summary api on each kubelet using ssl
    useRestClientSSL: true
    # if insecureSSL is set to true, insecure HTTPS API call is allowed, default false
    insecureSSL: true
matthewmodestino commented 4 years ago

yes these issues are caused by the metrics chart scraping the kubelet and not using insecureSSL to talk to port 10250. nothing to do with Splunk cloud certs. There are certs in many parts of this solution, so can be confusing...

Kubelet rarely has real certs...have you gotten it to work @rchenzheng ?

puzzlepri commented 4 years ago

Could you try

kubernetes
    # This option is used to get the metrics from summary api on each kubelet using ssl
    useRestClientSSL: true
    # if insecureSSL is set to true, insecure HTTPS API call is allowed, default false
    insecureSSL: true

Yes i do have use_rest_client_ssl true and insecureSSL true set but still have ssl error, Did this work for you?

matthewmodestino commented 4 years ago

Please share your values.yaml or a copy of the running configmap in the cluster. Also what flavour of k8s?

kubectl get cm kubectl describe cm

rchenzheng commented 4 years ago

Please share your values.yaml or a copy of the running configmap in the cluster. Also what flavour of k8s?

kubectl get cm kubectl describe cm

By setting splunk-kubernetes-metrics.kubernetes.useRestClientSSL: false

It fixed the ssl issue but now I get:

│ 2020-07-22 12:32:50 +0000 [error]: #0 Timer detached. title=:stats_metric_scraper                                                                                                                                                                                            │
│ 2020-07-22 12:32:50 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::BadRequest error="400 Bad Request"                                                                                                 │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in `exception_with_response'                                                                                                                          │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:in `return!'                                                                                                                                          │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in `process_result'                                                                                                                                             │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:in `block in transmit'                                                                                                                                          │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/ruby/net/http.rb:910:in `start'                                                                                                                                                                                           │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'                                                                                                                                                   │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'                                                                                                                                                    │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'                                                                                                                                                     │
│   2020-07-22 12:32:50 +0000 [error]: #0 /fluentd/plugins/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'                                                                                                                                                           │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'                                                                                                                                                │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'                                                                                                                                                              │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'                                                                                                                                                                   │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'                                                                                                                                     │
│   2020-07-22 12:32:50 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'                                                                                                                                 │
│ 2020-07-22 12:32:50 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper
rockb1017 commented 4 years ago

what is your kubelet --version ?

rchenzheng commented 4 years ago

what is your kubelet --version ?

Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.10", GitCommit:"1bea6c00a7055edef03f1d4bb58b773fa8917f11", GitTreeState:"clean", BuildDate:"2020-02-11T20:05:26Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
rockb1017 commented 4 years ago

i was able to reproduce it and fix it. i have working image with fix now. rock1017/k8s-metrics:1.1.3-2 feel free to use this image until an official release come out by Splunk. Thank you and let me know if it works for you or not. btw, I am using kubelet 10255 port for read-only http access.

rchenzheng commented 4 years ago

i was able to reproduce it and fix it. i have working image with fix now. rock1017/k8s-metrics:1.1.3-2 feel free to use this image until an official release come out by Splunk. Thank you and let me know if it works for you or not. btw, I am using kubelet 10255 port for read-only http access.

Any timelines on when this gets merged?

Thanks

matthewmodestino commented 4 years ago

hi @rockb1017 any high level details on root cause and how to identify if someone is impacted?

puzzlepri commented 4 years ago

@rockb1017 , Are there any changes to configmap metrics, here is my template fluent.conf: |

system wide configurations

<system>
  log_level debug
</system>
<source>
  @type kubernetes_metrics
  tag kube.*
  node_name "#{ENV['NODE_NAME']}"
  use_rest_client_ssl true
  cluster_name {{ splunk_cluster_id }}
  interval 15s
</source>
<filter kube.**>
  @type record_modifier
  <record>
    metric_name ${tag}
    cluster_name {{ splunk_cluster_id }}
  </record>
</filter>
<filter kube.node.**>
  @type record_modifier
  <record>
    source ${record['node']}
  </record>
</filter>
<filter kube.pod.**>
  @type record_modifier
  <record>
    source ${record['node']}/${record['pod-name']}
  </record>
</filter>
<filter kube.sys-container.**>
  @type record_modifier
  <record>
    source ${record['node']}/${record['pod-name']}/${record['name']}
  </record>
</filter>
<filter kube.container.**>
  @type record_modifier
  <record>
    source ${record['node']}/${record['pod-name']}/${record['container-name']}
  </record>
</filter>
# = custom filters specified by users =
<match kube.**>
  @type splunk_hec
  data_type metric
  metric_name_key metric_name
  metric_value_key value
  protocol https
  hec_host {{ splunk_hec_host }}
  hec_port {{ splunk_port }}
  hec_token "#{ENV['SPLUNK_HEC_TOKEN']}"
  host "#{ENV['NODE_NAME']}"
  index {{ splunk_metrics_index }}
  source ${tag}
  insecure_ssl true
  <buffer>
    @type memory
    chunk_limit_records 10000
    chunk_limit_size 100m
    flush_interval 5s
    flush_thread_count 1
    overflow_action block
    retry_max_times 3
    total_limit_size 400m
  </buffer>
</match>
rockb1017 commented 4 years ago

@puzzlepri for this bug, no change is needed on the configmap. @matthewmodestino i had to change the hard coded endpoint from /stats/ to /stats. would making this api uri's configurable as a param be a good improvement? (being future proof?)

puzzlepri commented 4 years ago

@puzzlepri for this bug, no change is needed on the configmap. @matthewmodestino i had to change the hard coded endpoint from /stats/ to /stats. would making this api uri's configurable as a param be a good improvement? (being future proof?)

@rockb1017 - tested with your image and still getting same SSL error

rockb1017 commented 4 years ago

@puzzlepri Could you share the entire error log you are getting? and which kubelet port are you using?

puzzlepri commented 4 years ago

@rockb1017

Here is the error, kublet port is 10250, where as my splunk-kubernetes-metrics-agg is running fine and scraping resource usage metrics

2020-07-22 18:52:44 +0000 [info]: starting fluentd-1.9.1 pid=1 ruby="2.5.5"
2020-07-22 18:52:44 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "-r/usr/local/share/gems/gems/bundler-2.1.4/lib/bundler/setup", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--under-supervisor"]
2020-07-22 18:52:47 +0000 [info]: adding filter pattern="kube.**" type="record_modifier"
2020-07-22 18:52:47 +0000 [info]: adding filter pattern="kube.node.**" type="record_modifier"
2020-07-22 18:52:47 +0000 [info]: adding filter pattern="kube.pod.**" type="record_modifier"
2020-07-22 18:52:47 +0000 [info]: adding filter pattern="kube.sys-container.**" type="record_modifier"
2020-07-22 18:52:47 +0000 [info]: adding filter pattern="kube.container.**" type="record_modifier"
2020-07-22 18:52:47 +0000 [info]: adding match pattern="kube.**" type="splunk_hec"
2020-07-22 18:52:48 +0000 [info]: adding source type="kubernetes_metrics"
2020-07-22 18:52:49 +0000 [info]: #0 Use URL http://<ip>:10250/stats/summary for creating client to query kubelet summary api
2020-07-22 18:52:49 +0000 [info]: #0 Use URL http://<ip>:10250/stats for creating client to query kubelet stats api
2020-07-22 18:52:49 +0000 [info]: #0 Use URL http://<ip>:10250/metrics/cadvisor for creating client to query cadvisor metrics api
2020-07-22 18:52:49 +0000 [debug]: #0 No fluent logger for internal event
2020-07-22 18:52:49 +0000 [warn]: parameter 'cluster_name' in <source>
  @type kubernetes_metrics
  tag "kube.*"
  node_name "<nodename>t"
  use_rest_client_ssl false
  cluster_name <clustermame>
  interval 15s
</source> is not used.
2020-07-22 18:52:49 +0000 [info]: #0 starting fluentd worker pid=11 ppid=1 worker=0
2020-07-22 18:52:49 +0000 [debug]: #0 buffer started instance=47126338487580 stage_size=0 queue_size=0
2020-07-22 18:52:49 +0000 [info]: #0 fluentd worker is now running worker=0
2020-07-22 18:52:49 +0000 [debug]: #0 enqueue_thread actually running
2020-07-22 18:52:50 +0000 [debug]: #0 flush_thread actually running
2020-07-22 18:53:04 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:metric_scraper error_class=RestClient::BadRequest error="400 Bad Request"
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in `exception_with_response'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:in `return!'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in `process_result'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:in `block in transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/ruby/net/http.rb:910:in `start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:635:in `scrape_metrics'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-22 18:53:04 +0000 [error]: #0 Timer detached. title=:metric_scraper
2020-07-22 18:53:04 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::BadRequest error="400 Bad Request"
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in `exception_with_response'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:in `return!'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in `process_result'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:in `block in transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/ruby/net/http.rb:910:in `start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:647:in `scrape_stats_metrics'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-22 18:53:04 +0000 [error]: #0 Timer detached. title=:stats_metric_scraper
2020-07-22 18:53:04 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::BadRequest error="400 Bad Request"
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in `exception_with_response'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:in `return!'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in `process_result'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:in `block in transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/ruby/net/http.rb:910:in `start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-22 18:53:04 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-22 18:53:04 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-22 18:53:04 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper
druesendieb commented 4 years ago

We're having the same problem around here, using the image rock1017/k8s-metrics:1.1.3-2 didn't help.

Log Messages of the metrics pod, should also include the config used:

`/opt/app-root/src` is not writable.
Bundler will use `/tmp/bundler20200728-1-gnv4xv1' as your home directory temporarily.
2020-07-28 08:54:31 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2020-07-28 08:54:31 +0000 [info]: gem 'fluentd' version '1.9.1'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-jq' version '0.5.1'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-kubernetes-metrics' version '1.1.3'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.4.2'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.7.0'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2020-07-28 08:54:31 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.2.1'
2020-07-28 08:54:33 +0000 [info]: Use URL https://IP:10250/stats/summary for creating client to query kubelet summary api
2020-07-28 08:54:33 +0000 [info]: Use URL https://IP:10250/stats for creating client to query kubelet stats api
2020-07-28 08:54:33 +0000 [info]: Use URL https://IP:10250/metrics/cadvisor for creating client to query cadvisor metrics api
2020-07-28 08:54:33 +0000 [info]: using configuration file: <ROOT>
  <system>
    log_level info
  </system>
  <source>
    @type kubernetes_metrics
    tag "kube.*"
    node_name "nodename"
    use_rest_client_ssl true
    cluster_name core
    interval 15s
  </source>
  <filter kube.**>
    @type record_modifier
    <record>
      metric_name ${tag}
      cluster_name core
    </record>
  </filter>
  <filter kube.node.**>
    @type record_modifier
    <record>
      source ${record['node']}
    </record>
  </filter>
  <filter kube.pod.**>
    @type record_modifier
    <record>
      source ${record['node']}/${record['pod-name']}
    </record>
  </filter>
  <filter kube.sys-container.**>
    @type record_modifier
    <record>
      source ${record['node']}/${record['pod-name']}/${record['name']}
    </record>
  </filter>
  <filter kube.container.**>
    @type record_modifier
    <record>
      source ${record['node']}/${record['pod-name']}/${record['container-name']}
    </record>
  </filter>
  <match kube.**>
    @type splunk_hec
    data_type metric
    metric_name_key "metric_name"
    metric_value_key "value"
    protocol https
    hec_host "https://internal.domain.eu"
    hec_port 8088
    hec_token ""
    host "hostname"
    index "em_metrics"
    source "${tag}"
    insecure_ssl true
    <buffer>
      @type "memory"
      chunk_limit_records 10000
      chunk_limit_size 100m
      flush_interval 5s
      flush_thread_count 1
      overflow_action block
      retry_max_times 3
      total_limit_size 400m
    </buffer>
  </match>
</ROOT>
2020-07-28 08:54:33 +0000 [info]: starting fluentd-1.9.1 pid=1 ruby="2.5.5"
2020-07-28 08:54:33 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "-r/usr/local/share/gems/gems/bundler-2.1.4/lib/bundler/setup", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--under-supervisor"]
2020-07-28 08:54:36 +0000 [info]: adding filter pattern="kube.**" type="record_modifier"
2020-07-28 08:54:36 +0000 [info]: adding filter pattern="kube.node.**" type="record_modifier"
2020-07-28 08:54:36 +0000 [info]: adding filter pattern="kube.pod.**" type="record_modifier"
2020-07-28 08:54:36 +0000 [info]: adding filter pattern="kube.sys-container.**" type="record_modifier"
2020-07-28 08:54:37 +0000 [info]: adding filter pattern="kube.container.**" type="record_modifier"
2020-07-28 08:54:37 +0000 [info]: adding match pattern="kube.**" type="splunk_hec"
2020-07-28 08:54:37 +0000 [info]: adding source type="kubernetes_metrics"
2020-07-28 08:54:38 +0000 [info]: #0 Use URL https://IP:10250/stats/summary for creating client to query kubelet summary api
2020-07-28 08:54:38 +0000 [info]: #0 Use URL https://IP:10250/stats for creating client to query kubelet stats api
2020-07-28 08:54:38 +0000 [info]: #0 Use URL https://IP:10250/metrics/cadvisor for creating client to query cadvisor metrics api
2020-07-28 08:54:38 +0000 [warn]: parameter 'cluster_name' in <source>
  @type kubernetes_metrics
  tag "kube.*"
  node_name "nodename"
  use_rest_client_ssl true
  cluster_name core
  interval 15s
</source> is not used.
2020-07-28 08:54:38 +0000 [info]: #0 starting fluentd worker pid=10 ppid=1 worker=0
2020-07-28 08:54:38 +0000 [info]: #0 fluentd worker is now running worker=0
2020-07-28 08:54:53 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)"
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:635:in `scrape_metrics'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-28 08:54:53 +0000 [error]: #0 Timer detached. title=:metric_scraper
2020-07-28 08:54:53 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:stats_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)"
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:647:in `scrape_stats_metrics'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-28 08:54:53 +0000 [error]: #0 Timer detached. title=:stats_metric_scraper
2020-07-28 08:54:53 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::SSLCertificateNotVerified error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)"
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:776:in `rescue in transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:651:in `transmit'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
  2020-07-28 08:54:53 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.3/lib/fluent/plugin/in_kubernetes_metrics.rb:660:in `scrape_cadvisor_metrics'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-07-28 08:54:53 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.9.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-07-28 08:54:53 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper

I've tried setting different values for hec_host "https://internal.domain.eu". Same outcome if I use https:// or not at the beginning.

matthewmodestino commented 4 years ago

It is not the cert on HEC side, its the cert in the INPUT side!!

Please set insecureSSL to true in the kubernetes section of the metrics chart!

https://github.com/splunk/splunk-connect-for-kubernetes/blob/e680eb1d6fe53c0346ad66ddb22b3d8ee0703226/helm-chart/splunk-connect-for-kubernetes/values.yaml#L875

If you are struggling, hit me up in community slack (splk.it/slack - @mattymo)in #kubernetes channel

druesendieb commented 4 years ago

Thanks to @matthewmodestino I've fixed my problem.

I had only defined global.kubernetes.insecureSSL (which isn't a thing apparently), but you have to define kubernetes.insecureSSL additionally in the section for metrics, the value global.kubernetes.insecureSSL is not passed on in this case!

puzzlepri commented 4 years ago

@matthewmodestino - I do have insecureSSL to true in my value.yaml.

puzzlepri commented 4 years ago

@matthewmodestino @rockb1017 Any inputs on the issue?

matthewmodestino commented 4 years ago

@puzzlepri You likely still have incorrect settings in your values.yaml or you need to restart the pods after re-apply. Your errors look like you are calling the wrong ports..

There are various insecureSSL parameters and port options you need to set properly.

The one you need is on the INPUT side, has nothing to do with HEC and should be set in the local metrics chart as seen here:

https://github.com/splunk/splunk-connect-for-kubernetes/blob/69a5fa2091401214d53e9fb4e2a7db755cc87d33/helm-chart/splunk-connect-for-kubernetes/values.yaml#L888

please post your entire values.yaml or ping me on slack so I can explain.

puzzlepri commented 4 years ago

Thanks @matthewmodestino , here is my value.yaml and using version splunk-connect-for-kubernetes-1.3.0.tgz

global:
  logLevel: debug
  splunk:
    hec:
      insecureSSL: true
      host: <host_name>
      port: 8088
      token: <splunk_token>
      indexName: main
  kubernetes:
    clusterName: "dev1"
    openshift: true

## Enabling logging will install the `splunk-kubernetes-logging` chart to a kubernetes
## cluster to collect logs generated in the cluster to a Splunk indexer/indexer cluster.
logging:
  enabled: true

## Enabling objects will install the `splunk-kubernetes-objects` chart to a kubernetes
## cluster to collect kubernetes objects in the cluster to a Splunk indexer/indexer cluster.
objects:
  enabled: true

## Enabling metrics will install the `splunk-kubernetes-metrics` chart to a kubernetes
## cluster to collect metrics of the cluster to a Splunk indexer/indexer cluster.
metrics:
  enabled: true
aggregatorBuffer:
  "@type": memory
  total_limit_size: 400m
  chunk_limit_size: 100m
  chunk_limit_records: 10000
  flush_interval: 5s
  flush_thread_count: 1
  overflow_action: block
  retry_max_times: 3

# Configure how often SCK pulls metrics for its kubenetes sources. 15s is the defa
metricsInterval: 15s

splunk-kubernetes-logging:
  splunk:
    hec:
      insecureSSL: true
      host: <host_name>
      port: 8088
      token: <splunk_token>
      indexName: k8s_logs
  containers:
    logFormatType: cri
    logFormat: "%Y-%m-%dT%H:%M:%S.%N%:z"
  serviceAccount:
    create: true

splunk-kubernetes-objects:
  splunk:
    hec:
      insecureSSL: true
      host: <host_name>
      port: 8088
      token: <splunk_token>
      indexName: k8s_objects
  serviceAccount:
    create: true
matthewmodestino commented 4 years ago

you have no kubernetes section in your metrics and objects sections of your yaml, so you are getting defaults.

What flavour of k8s are you using? Openshift or OSS?

Please review this example and set the proper settings for your cluster!

https://mattymo.io/code/mattymo/splunk_toronto_usergroup_may_2020#deploy-splunk-connect-for-kubernetes

puzzlepri commented 4 years ago

I am using Openshift 4

matthewmodestino commented 4 years ago

then please set your kubelet_portis set to 10250 and that use_rest_client_ssl is true and insecureSSL is false. These are all sub settings of the metrics chart under kubernetes

Do not set in global...

rockb1017 commented 4 years ago

closing. Thank you! feel free to reopen if you need further help.