splunk / fluent-plugin-kubernetes-metrics

Fluentd input plugin which queries Kubernetes kubelet summary API to collect Kubernetes metrics.
Apache License 2.0
12 stars 13 forks source link

error_class=RestClient::NotFound error="404 Not Found" #86

Closed izark1 closed 3 years ago

izark1 commented 3 years ago

Hi, I'm deploying this plugin and run it in my environment, but I get this these errors in the my splunk-fluentd-k8s-metrics container:

2021-05-02 23:08:20 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::NotFound error="404 Not Found" 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in exception_with_response' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:inreturn!' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in process_result' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:inblock in transmit' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/ruby/net/http.rb:933:in start' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:intransmit' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in execute' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:inexecute' 2021-05-02 23:08:20 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.5/lib/fluent/plugin/in_kubernetes_metrics.rb:660:in scrape_cadvisor_metrics' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/timer.rb:80:inon_timer' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in run_once' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:inrun' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/event_loop.rb:93:in block in start' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create' 2021-05-02 23:08:20 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper

any help please with that ?!

izark1 commented 3 years ago

Any help please with this?

luckyj5 commented 3 years ago

@izark1 Thanks for reporting this issue. Can you please share, how are you deploying this plugin?

izark1 commented 3 years ago

Hi @luckyj5 , I've deployed the SCK https://github.com/splunk/splunk-connect-for-kubernetes using Helm 3, I'm interested in the metrics collection part. I configured my_values.yaml file with the proper configuration of my Splunk environment then run the command below helm install my-splunk-connect -f my_values.yaml splunk/splunk-connect-for-kubernetes

but I don't see the metrics in Splunk environment, after that, I got the logs as you see in the issue.

more information:

docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/splunk/k8s-metrics 1.1.5 65b48dd511c3 4 days ago 1.03 GB docker.io/splunk/fluentd-hec 1.2.5 d2b9528d8c03 4 days ago 1.08 GB docker.io/httpd 2.4 0b932df43057 3 weeks ago 138 MB docker.io/httpd latest 0b932df43057 3 weeks ago 138 MB k8s.gcr.io/kube-apiserver v1.21.0 4d217480042e 3 weeks ago 126 MB k8s.gcr.io/kube-proxy v1.21.0 38ddd85fe90e 3 weeks ago 122 MB k8s.gcr.io/kube-controller-manager v1.21.0 09708983cc37 3 weeks ago 120 MB k8s.gcr.io/kube-scheduler v1.21.0 62ad3129eca8 3 weeks ago 50.6 MB docker.io/weaveworks/weave-npc 2.8.1 7f92d556d4ff 3 months ago 39.3 MB docker.io/weaveworks/weave-kube 2.8.1 df29c0a4002c 3 months ago 89 MB k8s.gcr.io/pause 3.4.1 0f8457a4c2ec 3 months ago 683 kB k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2 6 months ago 42.5 MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ff 8 months ago 253 MB gcr.io/cadvisor/cadvisor v0.36.0 7414b6ed960c 10 months ago 184 MB

Can you please elaborate more what you need from me to do?

izark1 commented 3 years ago

Hi @luckyj5 , can you assist please ?!

luckyj5 commented 3 years ago

Please share your values.yaml or a copy of the running configmap in the cluster. Also, what version and flavor of K8s?

kubectl get cm kubectl describe cm

izark1 commented 3 years ago

Hi @luckyj5 , thanks for your response.

kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:30:03Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

this is the output for first command:

NAME DATA AGE kube-root-ca.crt 1 33m my-splunk-connect-splunk-kubernetes-logging 8 14m my-splunk-connect-splunk-kubernetes-metrics 1 14m my-splunk-connect-splunk-kubernetes-metrics-aggregator 1 14m my-splunk-connect-splunk-kubernetes-objects 1 14m

Name: kube-root-ca.crt Namespace: default Labels: Annotations:

Data

ca.crt:

-----BEGIN CERTIFICATE----- MIIC5zCCAc+gAwIBAgIBADANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwprdWJl cm5ldGVzMB4XDTIxMDUxMDIwMTk1OFoXDTMxMDUwODIwMTk1OFowFTETMBEGA1UE AxMKa3ViZXJuZXRlczCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALCU RISOjWA87S0NAANKMkkZSCcbXLopk67eLn+KrRCbCJKH14CsN9SmQtyGxfJDOu0B VbTqxj7RzDaAAI0r20SKWHbVsgJEJAZhNh198ZgX6FnrIrSOmISv4RNZkGyXbAyZ y9O2ZxdXpfhS87vI+JZJd0f6Kpax532qNBYhXSJ0WxHaFv1SpNmR8yXCcdmPjUNi k8jmKRgu54uQV7CYlyUEoBR1JkUEl4t5OwdiBv0Z8JdHg2pJVN//gqVHwJuGAbI4 BJ86Z/TwOFFR4WVVFrly8LzXzqjf4bMi2KH2pjg1S2uvkxzslLgxKOLiJOecA/aJ q9DfdZ+WHfvpVD4bPaECAwEAAaNCMEAwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB /wQFMAMBAf8wHQYDVR0OBBYEFLRWmF6qoOCq7J4w1pkNRr40BTx0MA0GCSqGSIb3 DQEBCwUAA4IBAQBJjoTEvtA4RA4nU5Fvxwuvm8nCiPHkPBnRcuflxuPX9/Aw08gQ A6paPIa25qJeYgqH/qJoQWcBbsqihTapXYpola6qkCIKcRvB56Qer8O/e3d7bxRw sF1u+lIrrc49BkjnV+x8AMDymjgQ2wCc2PxaCeGjn25zSf530iQd4aNZR2CvcSd2 a7LOf5pVd2gJsIFsC5YQhb8ZA2o07LodYLOqLJON2wXGhKFolxnAPJMm8i4hWHHb SaYGqy1zTLd+316AtKOfJ1NJJIkBwCtBIw5JuDlPERf01onq7nTW+Dtmqdw/akru FuCtRUnXUPRFcABQE33FDGZ/iOl/dwRtvlXi -----END CERTIFICATE-----

Events:

Name: my-splunk-connect-splunk-kubernetes-logging Namespace: default Labels: app=splunk-kubernetes-logging app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-logging-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

source.files.conf:

This fluentd conf file contains sources for log files other than container logs.

@id tail.file.kube-audit @type tail @label @CONCAT tag tail.file.kube:apiserver-audit path /var/log/kube-apiserver-audit.log pos_file /var/log/splunk-fluentd-kube-audit.pos read_from_head true path_key source

@type regexp expression /^(?.*)$/ time_key time time_type string time_format %Y-%m-%dT%H:%M:%SZ

source.journald.conf:

This fluentd conf file contains configurations for reading logs from systemd journal.

@id journald-docker @type systemd @label @CONCAT tag journald.kube:docker path "/run/log/journal" matches [{ "_SYSTEMD_UNIT": "docker.service" }] read_from_head true

@type local persistent true path /var/log/splunkd-fluentd-journald-docker.pos.json field_map {"MESSAGE": "log", "_SYSTEMD_UNIT": "source"} field_map_strict true

@id journald-kubelet @type systemd @label @CONCAT tag journald.kube:kubelet path "/run/log/journal" matches [{ "_SYSTEMD_UNIT": "kubelet.service" }] read_from_head true

@type local persistent true path /var/log/splunkd-fluentd-journald-kubelet.pos.json field_map {"MESSAGE": "log", "_SYSTEMD_UNIT": "source"} field_map_strict true

system.conf:

system wide configurations

log_level info root_dir /tmp/fluentd

fluent.conf:

@include system.conf @include source.containers.conf @include source.files.conf @include source.journald.conf @include monit.conf @include output.conf @include prometheus.conf monit.conf:

@id fluentd-monitor-agent @type monitor_agent @label @SPLUNK tag monitor_agent output.conf:

Events are emitted to the CONCAT label from the container, file and journald sources for multiline processing.

<label @CONCAT>

= filters for container logs =

<filter tail.containers.var.log.containers.dns-controllerdns-controller.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnssidecar.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnsdnsmasq.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-apiserverkube-apiserver.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-controller-managerkube-controller-manager.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dns-autoscalerautoscaler.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-proxykube-proxy.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-schedulerkube-scheduler.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnskubedns.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true

= filters for journald logs =

@type concat key log timeout_label @SPLUNK multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5

Events are relabeled then emitted to the SPLUNK label

<match **> @type relabel @label @SPLUNK <label @SPLUNK>

filter to remove empty lines

<filter tail.containers.**> @type grep

key log pattern ^$

Enrich log with k8s metadata

<filter tail.containers.*> @type kubernetes_metadata annotation_match [ "." ] de_dot false watch true cache_ttl 3600 <filter tail.containers.**> @type record_transformer enable_ruby

# set the sourcetype from splunk.com/sourcetype pod annotation or set it to kube:container:CONTAINER_NAME sourcetype ${record.dig("kubernetes", "annotations", "splunk.com/sourcetype") ? record.dig("kubernetes", "annotations", "splunk.com/sourcetype") : "kube:container:"+record.dig("kubernetes","container_name")} container_name ${record.dig("kubernetes","container_name")} namespace ${record.dig("kubernetes","namespace_name")} pod ${record.dig("kubernetes","pod_name")} container_id ${record.dig("docker","container_id")} pod_uid ${record.dig("kubernetes","pod_id")} container_image ${record.dig("kubernetes","container_image")} # set the cluster_name field to the configured value, or default to "cluster_name" cluster_name cluster_name # set the splunk_index field to the value found in the pod splunk.com/index annotations. if not set, use namespace annotation, or default to the default_index splunk_index ${record.dig("kubernetes", "annotations", "splunk.com/index") ? record.dig("kubernetes", "annotations", "splunk.com/index") : record.dig("kubernetes", "namespace_annotations", "splunk.com/index") ? (record["kubernetes"]["namespace_annotations"]["splunk.com/index"]) : ("k8s")} label_app ${record.dig("kubernetes","labels","app")} label_k8s-app ${record.dig("kubernetes","labels","k8s-app")} label_release ${record.dig("kubernetes","labels","release")} exclude_list ${record.dig("kubernetes", "annotations", "splunk.com/exclude") ? record.dig("kubernetes", "annotations", "splunk.com/exclude") : record.dig("kubernetes", "namespace_annotations", "splunk.com/exclude") ? (record["kubernetes"]["namespace_annotations"]["splunk.com/exclude"]) : ("false")}

<filter tail.containers.**>

Exclude all logs that are marked

@type grep
<exclude>
  key exclude_list
  pattern /^true$/
</exclude>

extract pod_uid and container_name for CRIO runtime

create source and sourcetype

<filter journald.**> @type jq_transformer jq '.record.source = "/run/log/journal/" + .record.source | .record.sourcetype = (.tag | ltrimstr("journald.")) | .record.cluster_name = "cluster_name" | .record.splunk_index = "k8s" |.record'

= filters for non-container log files =

extract sourcetype

<filter tail.file.**> @type jq_transformer jq '.record.sourcetype = (.tag | ltrimstr("tail.file.")) | .record.cluster_name = "cluster_name" | .record.index = "k8s" | .record'

= filters for monitor agent =

@type jq_transformer jq ".record.source = \"namespace:#{ENV['MY_NAMESPACE']}/pod:#{ENV['MY_POD_NAME']}\" | .record.sourcetype = \"fluentd:monitor-agent\" | .record.cluster_name = \"cluster_name\" | .record.splunk_index = \"k8s\" | .record"

= custom filters specified by users =

= output =

<match **> @type splunk_hec protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" index_key splunk_index insecure_ssl true host "#{ENV['K8S_NODE_NAME']}" source_key source sourcetype_key sourcetype

# currently CRI does not produce log paths with all the necessary # metadata to parse out pod, namespace, container_name, container_id. # this may be resolved in the future by this issue: https://github.com/kubernetes/kubernetes/issues/58638#issuecomment-385126031 container_image pod_uid pod container_name namespace container_id cluster_name label_app label_k8s-app label_release
app_name splunk-kubernetes-logging
app_version 1.4.7
<buffer>
  @type memory
  chunk_limit_records 100000
  chunk_limit_size 20m
  flush_interval 5s
  flush_thread_count 1
  overflow_action block
  retry_max_times 5
  retry_type periodic
  total_limit_size 600m
</buffer>
<format monitor_agent>
  @type json
</format>
<format>
  # we just want to keep the raw logs, not the structure created by docker or journald
  @type single_value
  message_key log
  add_newline false
</format>

prometheus.conf:

input plugin that exports metrics

@type prometheus

@type forward

input plugin that collects metrics from MonitorAgent

@type prometheus_monitor

host ${hostname}

input plugin that collects metrics for output plugin

@type prometheus_output_monitor

host ${hostname}

source.containers.conf:

This configuration file for Fluentd / td-agent is used

to watch changes to Docker log files. The kubelet creates symlinks that

capture the pod name, namespace, container name & Docker container ID

to the docker logs for pods in the /var/log/containers directory on the host.

If running this fluentd configuration in a Docker container, the /var/log

directory should be mounted in the container.

reading kubelet logs from journal

#

Reference:

https://github.com/kubernetes/community/blob/20d2f6f5498a5668bae2aea9dcaf4875b9c06ccb/contributors/design-proposals/node/kubelet-cri-logging.md

#

Json Log Example:

{"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}

CRI Log Example (not supported):

2016-02-17T00:04:05.931087621Z stdout P { 'long': { 'json', 'object output' },

2016-02-17T00:04:05.931087621Z stdout F 'splitted': 'partial-lines' }

2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here

@id containers.log @type tail @label @CONCAT tag tail.containers. path /var/log/containers/.log pos_file /var/log/splunk-fluentd-containers.log.pos path_key source read_from_head true refresh_interval 60

@type json time_format %Y-%m-%dT%H:%M:%S.%NZ time_key time time_type string localtime false

Events:

Name: my-splunk-connect-splunk-kubernetes-metrics Namespace: default Labels: app=splunk-kubernetes-metrics app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-metrics-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

system wide configurations

log_level info

@type kubernetes_metrics tag kube.* node_name "#{ENV['NODE_NAME']}" kubelet_port 10248 use_rest_client_ssl false insecure_ssl true cluster_name cluster_name interval 15s <filter kube.**> @type record_modifier

metric_name ${tag} cluster_name cluster_name

<filter kube.node.**> @type record_modifier

source ${record['node']}

<filter kube.pod.**> @type record_modifier

source ${record['node']}/${record['pod-name']}

<filter kube.sys-container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['name']}

<filter kube.container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['container-name']}

= custom filters specified by users =

<match kube.**> @type splunk_hec data_type metric metric_name_key metric_name metric_value_key value protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" index em_metrics source ${tag} insecure_ssl true app_name splunk-kubernetes-metrics app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 10m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 400m

Events:

Name: my-splunk-connect-splunk-kubernetes-metrics-aggregator Namespace: default Labels: app=splunk-kubernetes-metrics app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-metrics-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

system wide configurations

log_level info

@type kubernetes_metrics_aggregator tag kube.* interval 15s <filter kube.**> @type record_modifier

metric_name ${tag} cluster_name cluster_name

<filter kube.cluster.**> @type record_modifier

source ${record['name']}

<filter kube.namespace.**> @type record_modifier

source ${record['name']}

<filter kube.node.**> @type record_modifier

source ${record['node']}

<filter kube.pod.**> @type record_modifier

source ${record['node']}/${record['pod-name']}

<filter kube.sys-container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['name']}

<filter kube.container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['container-name']}

<match kube.**> @type splunk_hec data_type metric metric_name_key metric_name metric_value_key value protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" index em_metrics source source insecure_ssl true app_name splunk-kubernetes-metrics app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 10m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 400m

Events:

Name: my-splunk-connect-splunk-kubernetes-objects Namespace: default Labels: app=splunk-kubernetes-objects app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-objects-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

log_level info

@type kubernetes_objects tag kube.objects.* api_version "v1" insecure_ssl false

resource_name pods resource_name namespaces resource_name nodes resource_name events

<filter kube.**> @type jq_transformer

in ruby '\' will escape and become just '\', since we need two '\' in the gsub jq filter, it becomes '\\'.

jq '.record.source = "namespace:(env.MY_NAMESPACE)/pod:(env.MY_POD_NAME)" | .record.sourcetype = (.tag | gsub("\\."; ":")) | .record'

<filter kube.**> @type jq_transformer jq '.record.cluster_name = "cluster_name" | .record'

= custom filters specified by users =

<match kube.**> @type splunk_hec protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" source_key source sourcetype_key sourcetype index k8s insecure_ssl true

cluster_name

app_name splunk-kubernetes-objects app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 20m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 600m

Events:

izark1 commented 3 years ago

@luckyj5 : values.yaml

Splunk Connect for Kubernetes is a umbraller chart for three charts

* splunk-kubernetes-logging

* splunk-kubernetes-objects

* splunk-kubernetes-metrics

Use global configurations for shared configurations between sub-charts.

Supported global configurations:

Values defined here are the default values.

global: logLevel: info splunk: hec:

host is required and should be provided by user

  host: 10.10.1.100
  # port to HEC, optional, default 8088
  port: 8088
  # token is required and should be provided by user
  token: ad4df02b-d141-4297-b890-24ae31745e47
  # protocol has two options: "http" and "https", default is "https"
  protocol: http
  # indexName tells which index to use, this is optional. If it's not present, will use "main".
  indexName: k8s
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default is false.
  insecureSSL: true
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:
  # For object and metrics
  indexRouting:

kubernetes:

The cluster name used to tag logs. Default is cluster_name

clusterName: "cluster_name"

prometheus_enabled: true monitoring_agent_enabled: true

Enabling splunk-kubernetes-logging will install the splunk-kubernetes-logging chart to a kubernetes

cluster to collect logs generated in the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-logging: enabled: true

logLevel is to set log level of the Splunk log collector. Avaiable values are:

* trace

* debug

* info (default)

* warn

* error

logLevel:

This is can be used to exclude verbose logs including various system and Helm/Tiller related logs.

fluentd:

path of logfiles, default /var/log/containers/*.log

path: /var/log/containers/*.log
# paths of logfiles to exclude. object type is array as per fluentd specification:
# https://docs.fluentd.org/input/tail#exclude_path
exclude_path:
#  - /var/log/containers/kube-svc-redirect*.log
#  - /var/log/containers/tiller*.log
#  - /var/log/containers/*_kube-system_*.log (to exclude `kube-system` namespace)

Configurations for container logs

containers:

Path to root directory of container logs

path: /var/log
# Final volume destination of container log symlinks
pathDest: /var/lib/docker/containers
# Log format type, "json" or "cri"
logFormatType: json
# Specify the logFormat for "cri" logFormatType - provide time format
# For example "%Y-%m-%dT%H:%M:%S.%N%:z" for openshift, "%Y-%m-%dT%H:%M:%S.%NZ" for IBM IKS 
# Default for "cri": "%Y-%m-%dT%H:%M:%S.%N%:z"
logFormat:
# Specify the interval of refreshing the list of watch file.
refreshInterval:

Enriches log record with kubernetes data

k8sMetadata:

Pod labels to collect

podLabels:
  - app
  - k8s-app
  - release
watch: true
cache_ttl: 3600

sourcetypePrefix: "kube"

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true
# If you are on OpenShift and you want to run the a privileged pod
# you need to have a ClusterRoleBinding for the system:openshift:scc:privileged
# ClusterRole. Set to `true` to create the ClusterRoleBinding resource
# for the ServiceAccount.
openshiftPrivilegedSccBinding: false

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

Local splunk configurations

splunk:

Configurations for HEC (HTTP Event Collector)

hec:
  # host is required and should be provided by user
  host:
  # port to HEC, optional, default 8088
  port:
  # token is required and should be provided by user
  token:
  # protocol has two options: "http" and "https", default is "https"
  protocol:
  # indexName tells which index to use, this is optional. If it's not present, will use "main".
  indexName:
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default is false.
  insecureSSL:
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:
# Configurations for Ingest API
ingest_api:
  # serviceClientIdentifier is a string, the client identifier is used to make requests to the ingest API with authorization.
  serviceClientIdentifier:
  # serviceClientSecretKey is a string, the client identifier is used to make requests to the ingest API with authorization.
  serviceClientSecretKey:
  # tokenEndpoint is a string, it indicates which endpoint should be used to get the authorization token used to make requests to the ingest API.
  tokenEndpoint:
  # ingestAuthHost is a string, it indicates which url/hostname should be used to make token auth requests to the ingest API.
  ingestAuthHost:
  # ingestAPIHost is a string, it indicates which url/hostname should be used to make requests to the ingest API.
  ingestAPIHost:
  # tenant is a string, it indicates which tenant should be used to make requests to the ingest API.
  tenant:
  # eventsEndpoint is a string, it indicates which endpoint should be used to make requests to the ingest API.
  eventsEndpoint:
  # debugIngestAPI is a boolean, it indicates whether user wants to debug requests and responses to ingest API. Default is false.
  debugIngestAPI:

Create or use existing secret if name is empty default name is used

secret: create: true name:

Directory where to read journald logs.

journalLogPath: /run/log/journal

Set to true, to change the encoding of all strings to utf-8.

#

By default fluentd uses ASCII-8BIT encoding. If you have 2-byte chars in your logs

you need to set the encoding to UTF-8 instead.

# charEncodingUtf8: false

logs defines the source of logs, multiline support, and their sourcetypes.

#

The scheme to define a log is:

#

```

:

from:

timestampExtraction:

regexp: ""

format: ""

multiline:

firstline: ""

flushInterval 5

sourcetype: ""

```

#

= =

It supports 3 kinds of sources: journald, file, and container.

For journald logs, unit is required for filtering using _SYSTEMD_UNIT, example:

```

docker:

from:

journald:

unit: docker.service

```

#

For file logs, path is required for specifying where is the log files. Log files are expected in /var/log, example:

```

docker:

from:

file:

path: /var/log/docker.log

```

#

For container logs, pod name is required. You can also provide the container name, if it's not provided, the name of this source will be used as the container name:

```

kube-apiserver:

from:

pod: kube-apiserver

#

etcd:

from:

pod: etcd-server

container: etcd-container

```

#

= timestamp =

timestampExtraction defines how to extract timestamp from logs. This only works for file source.

To use timestampExtraction you need to define both:

- regexp: the Regular Expression used to find the timestamp from a log entry.

The timestamp part must be in a time named group. E.g.

(?

- format: a format string defintes how to parse the timestamp, e.g. "%Y-%m-%d %H:%M:%S".

More details can be find: http://ruby-doc.org/stdlib-2.5.0/libdoc/time/rdoc/Time.html#method-c-strptime

#

= multiline =

multiline options provide basic multiline support. Two options:

- firstline: a Regular Expression used to detect the first line of a multiline log.

- flushInterval: The number of seconds after which the last received event log will be flushed, default value: 5s.

#

= sourcetype =

sourcetype of each kind of log can be defined using the sourcetype field.

If sourcetype is not defined, name will be used.

#

---

Here we have some default timestampExtraction and multiline settings for kubernetes components.

So, usually you just need to redefine the source of those components if necessary.

logs: docker: from: journald: unit: docker.service timestampExtraction: regexp: time="(?

Defines which version of image to use, and how it should be pulled.

image:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/fluentd-hec
# The tag of the image to pull
tag: 1.2.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Environment variable for daemonset

environmentVar:

Controls the resources used by the fluentd daemonset

resources:

limits:

#  cpu: 100m
#  memory: 200Mi
requests:
  cpu: 100m
  memory: 200Mi

Controls the output buffer for the fluentd daemonset

Note that, for memory buffer, if resources.limits.memory is set,

the total buffer size should not bigger than the memory limit, it should also

consider the basic memory usage by fluentd itself.

All buffer parameters (except Argument) defined in

https://docs.fluentd.org/v1.0/articles/buffer-section#parameters

can be configured here.

buffer: "@type": memory total_limit_size: 600m chunk_limit_size: 20m chunk_limit_records: 100000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

set to true to keep the structure created by docker or journald

sendAllMetadata: false

This default tolerations allow the daemonset to be deployed on master nodes,

so that we can also collect logs from those nodes.

tolerations:

Enabling splunk-kubernetes-objects will install the splunk-kubernetes-objects chart to a kubernetes

cluster to collect kubernetes objects in the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-objects: enabled: true

logLevel is to set log level of the object collector. Avaiable values are:

* trace

* debug

* info (default)

* warn

* error

logLevel:

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

= Kubernetes Connection Configs =

kubernetes:

the URL for calling kubernetes API, by default it will be read from the environment variables

url:
# if insecureSSL is set to true, insecure HTTPS API call is allowed, default false
insecureSSL: false
# Path to the certificate file for this client.
clientCert:
# Path to the private key file for this client.
clientKey:
# Path to the CA file.
caFile:
# Path to the file contains the API token. By default it reads from the file "token" in the `secret_dir`.
bearerTokenFile:
# Path of the location where pod's service account's credentials are stored. Usually you don't need to care about this config, the default value should work in most cases.
secretDir:
# The cluster name used to tag cluster metrics from the aggregator. Default is cluster_name
clusterName:
# Add privileged access to containers for openshift compatibility
openshift: false

= Object Lists =

NOTE: at least one object must be provided.

#

== Schema ==

```

objects:

:

:

-

```

#

Each objectDefinition has the following fields:

* mode:

define in which way it collects this type of object, either "poll" or "watch".

- "poll" mode will read all objects of this type use the list API at an interval.

- "watch" mode will setup a long connection using the watch API to just get updates.

* name: [REQUIRED]

name of the object, e.g. pods, namespaces.

Note that for resource names that contains multiple words, like daemonsets,

words need to be separated with _, so daemonsets becomes daemon_sets.

* namespace:

only collects objects from the specified namespace, by default it's all namespaces

* labelSelector:

select objects by label(s)

* fieldSelector:

select objects by field(s)

* interval:

the interval at which object is pulled, default 15 minutes.

Only useful for "poll" mode.

#

== Example ==

```

objects:

core:

v1:

- name: pods

namespace: default

mode: pull

interval: 60m

- name: events

mode: watch

apps:

v1:

- name: daemon_sets

labelSelector: environment=production

```

objects: core: v1:

Enabling splunk-kubernetes-metrics will install the splunk-kubernetes-metrics chart to a kubernetes

cluster to collect metrics of the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-metrics: enabled: true

logLevel is to set log level of the Splunk kubernetes metrics collector. Avaiable values are:

* debug

* info (default)

* warn

* error

logLevel:

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

= Splunk HEC Connection =

splunk:

Configurations for HEC (HTTP Event Collector)

hec:
  # hostname/ip of HEC, REQUIRED.
  host:
  # port to HEC, OPTIONAL. Default value: 8088
  port:
  # the HEC token, REQUIRED.
  token:
  # protocol has two options: "http" and "https". Default value: "https"
  protocol:
  # indexName tells which index to use, OPTIONAL. If it's not present, will use "main".
  indexName: em_metrics
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default value: false
  insecureSSL:
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:

Create or use existing secret if name is empty default name is used

secret: create: true name:

Defines which version of image to use, and how it should be pulled.

image:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/k8s-metrics
# The tag of the image to pull
tag: 1.1.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Defines which version of image to use, and how it should be pulled.

imageAgg:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/k8s-metrics-aggr
# The tag of the image to pull
tag: 1.1.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Environment variable for metrics daemonset

environmentVar:

Environment variable for metrics aggregator pod

environmentVarAgg:

Controls the resources used by the fluentd daemonset

resources: fluent: limits: cpu: 200m memory: 300Mi requests: cpu: 200m memory: 300Mi

Controls the output buffer for fluentd for the metrics pod

Note that, for memory buffer, if resources.sidecar.limits.memory is set,

the total buffer size should not bigger than the memory limit, it should also

consider the basic memory usage by fluentd itself.

All buffer parameters (except Argument) defined in

https://docs.fluentd.org/v1.0/articles/buffer-section#parameters

can be configured here.

buffer: "@type": memory total_limit_size: 400m chunk_limit_size: 10m chunk_limit_records: 10000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

Controls the output buffer for fluentd for the metrics aggregator pod

aggregatorBuffer: "@type": memory total_limit_size: 400m chunk_limit_size: 10m chunk_limit_records: 10000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

Configure how often SCK pulls metrics for its kubenetes sources. 15s is the default where 's' is seconds.

metricsInterval: 15s

Defines which nodes should be selected to deploy the fluentd daemonset.

nodeSelector: beta.kubernetes.io/os: linux

This default tolerations allow the daemonset to be deployed on master nodes,

so that we can also collect metrics from those nodes.

tolerations:

izark1 commented 3 years ago

Hi @luckyj5 ,

commands output attached: get_cm.txt describe_cm.txt

values.yaml: note: renamed the extension to .txt my_values.yaml.txt

izark1 commented 3 years ago

I want to add here, that even direct call to the endpoint results in Error 404 not found.

[root@docker1 k8s]# curl http://10.10.1.80:10248/stats/summary 404 page not found [root@docker1 k8s]#

izark1 commented 3 years ago

kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:30:03Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

rockb1017 commented 3 years ago

Hello, it seems you need to enable the metrics endpoint. Could you try steps in this thread and let us know? https://github.com/splunk/splunk-connect-for-kubernetes/issues/505#issuecomment-754705688

izark1 commented 3 years ago

Hi @rockb1017 , the case is the --enable-cadvisor-json-endpoints=true parameter can't be added to the /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf file nor in /var/lib/kubelet/kubeadm-flags.env file.

once add this parameter, so the complete ARG is Environment="KUBELET_KUBECONFIG_ARGS=--enable-cadvisor-json-endpoints=true --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" and attempt to do systemctl restart kubelet.service it immediatly fails..

Do I add it correctly?!

rockb1017 commented 3 years ago

what version of kubelet are you using?

rockb1017 commented 3 years ago

if you are using version that this option is removed, it won't work for that version. our metrics collector assumes this endpoint to collect metrics.

izark1 commented 3 years ago

[root@docker1 k8s]# kubelet --version Kubernetes v1.21.0

izark1 commented 3 years ago

So if it's a version compatibility issue, then what is max supported version of Kubernetes for SCK?

rockb1017 commented 3 years ago

the option is only available up to 1.20 https://v1-20.docs.kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

izark1 commented 3 years ago

Thanks all for the clarifying the reason.