izark1 commented 3 years ago

Hi, I'm deploying this plugin and run it in my environment, but I get this these errors in the my splunk-fluentd-k8s-metrics container:

2021-05-02 23:08:20 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:cadvisor_metric_scraper error_class=RestClient::NotFound error="404 Not Found" 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:249:in exception_with_response' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/abstract_response.rb:129:inreturn!' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:836:in process_result' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:743:inblock in transmit' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/ruby/net/http.rb:933:in start' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:727:intransmit' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in execute' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/rest-client-2.1.0/lib/restclient/request.rb:63:inexecute' 2021-05-02 23:08:20 +0000 [error]: #0 /opt/app-root/src/gem/fluent-plugin-kubernetes-metrics-1.1.5/lib/fluent/plugin/in_kubernetes_metrics.rb:660:in scrape_cadvisor_metrics' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/timer.rb:80:inon_timer' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:in run_once' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/cool.io-1.7.1/lib/cool.io/loop.rb:88:inrun' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/event_loop.rb:93:in block in start' 2021-05-02 23:08:20 +0000 [error]: #0 /usr/share/gems/gems/fluentd-1.11.5/lib/fluent/plugin_helper/thread.rb:78:inblock in thread_create' 2021-05-02 23:08:20 +0000 [error]: #0 Timer detached. title=:cadvisor_metric_scraper

any help please with that ?!

izark1 commented 3 years ago

Any help please with this?

luckyj5 commented 3 years ago

@izark1 Thanks for reporting this issue. Can you please share, how are you deploying this plugin?

izark1 commented 3 years ago

Hi @luckyj5 , I've deployed the SCK https://github.com/splunk/splunk-connect-for-kubernetes using Helm 3, I'm interested in the metrics collection part. I configured my_values.yaml file with the proper configuration of my Splunk environment then run the command below helm install my-splunk-connect -f my_values.yaml splunk/splunk-connect-for-kubernetes

but I don't see the metrics in Splunk environment, after that, I got the logs as you see in the issue.

more information:

docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/splunk/k8s-metrics 1.1.5 65b48dd511c3 4 days ago 1.03 GB docker.io/splunk/fluentd-hec 1.2.5 d2b9528d8c03 4 days ago 1.08 GB docker.io/httpd 2.4 0b932df43057 3 weeks ago 138 MB docker.io/httpd latest 0b932df43057 3 weeks ago 138 MB k8s.gcr.io/kube-apiserver v1.21.0 4d217480042e 3 weeks ago 126 MB k8s.gcr.io/kube-proxy v1.21.0 38ddd85fe90e 3 weeks ago 122 MB k8s.gcr.io/kube-controller-manager v1.21.0 09708983cc37 3 weeks ago 120 MB k8s.gcr.io/kube-scheduler v1.21.0 62ad3129eca8 3 weeks ago 50.6 MB docker.io/weaveworks/weave-npc 2.8.1 7f92d556d4ff 3 months ago 39.3 MB docker.io/weaveworks/weave-kube 2.8.1 df29c0a4002c 3 months ago 89 MB k8s.gcr.io/pause 3.4.1 0f8457a4c2ec 3 months ago 683 kB k8s.gcr.io/coredns/coredns v1.8.0 296a6d5035e2 6 months ago 42.5 MB k8s.gcr.io/etcd 3.4.13-0 0369cf4303ff 8 months ago 253 MB gcr.io/cadvisor/cadvisor v0.36.0 7414b6ed960c 10 months ago 184 MB

Can you please elaborate more what you need from me to do?

izark1 commented 3 years ago

Hi @luckyj5 , can you assist please ?!

luckyj5 commented 3 years ago

Please share your values.yaml or a copy of the running configmap in the cluster. Also, what version and flavor of K8s?

kubectl get cm kubectl describe cm

izark1 commented 3 years ago

Hi @luckyj5 , thanks for your response.

kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:30:03Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

this is the output for first command:

NAME DATA AGE kube-root-ca.crt 1 33m my-splunk-connect-splunk-kubernetes-logging 8 14m my-splunk-connect-splunk-kubernetes-metrics 1 14m my-splunk-connect-splunk-kubernetes-metrics-aggregator 1 14m my-splunk-connect-splunk-kubernetes-objects 1 14m

Name: kube-root-ca.crt Namespace: default Labels: Annotations:

Data

ca.crt:

Events:

Name: my-splunk-connect-splunk-kubernetes-logging Namespace: default Labels: app=splunk-kubernetes-logging app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-logging-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

source.files.conf:

This fluentd conf file contains sources for log files other than container logs.

@id tail.file.kube-audit @type tail @label @CONCAT tag tail.file.kube:apiserver-audit path /var/log/kube-apiserver-audit.log pos_file /var/log/splunk-fluentd-kube-audit.pos read_from_head true path_key source

@type regexp expression /^(?.*)$/ time_key time time_type string time_format %Y-%m-%dT%H:%M:%SZ

source.journald.conf:

This fluentd conf file contains configurations for reading logs from systemd journal.

@id journald-docker @type systemd @label @CONCAT tag journald.kube:docker path "/run/log/journal" matches [{ "_SYSTEMD_UNIT": "docker.service" }] read_from_head true

@type local persistent true path /var/log/splunkd-fluentd-journald-docker.pos.json field_map {"MESSAGE": "log", "_SYSTEMD_UNIT": "source"} field_map_strict true

@id journald-kubelet @type systemd @label @CONCAT tag journald.kube:kubelet path "/run/log/journal" matches [{ "_SYSTEMD_UNIT": "kubelet.service" }] read_from_head true

@type local persistent true path /var/log/splunkd-fluentd-journald-kubelet.pos.json field_map {"MESSAGE": "log", "_SYSTEMD_UNIT": "source"} field_map_strict true

system.conf:

system wide configurations

log_level info root_dir /tmp/fluentd

fluent.conf:

@include system.conf @include source.containers.conf @include source.files.conf @include source.journald.conf @include monit.conf @include output.conf @include prometheus.conf monit.conf:

@id fluentd-monitor-agent @type monitor_agent @label @SPLUNK tag monitor_agent output.conf:

Events are emitted to the CONCAT label from the container, file and journald sources for multiline processing.

= filters for container logs =

<filter tail.containers.var.log.containers.dns-controllerdns-controller.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnssidecar.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnsdnsmasq.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-apiserverkube-apiserver.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-controller-managerkube-controller-manager.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dns-autoscalerautoscaler.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-proxykube-proxy.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-schedulerkube-scheduler.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true <filter tail.containers.var.log.containers.kube-dnskubedns.log> @type concat key log timeout_label @SPLUNK stream_identity_key stream multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5 separator "" use_first_timestamp true

= filters for journald logs =

@type concat key log timeout_label @SPLUNK multiline_start_regexp /^\w[0-1]\d[0-3]\d/ flush_interval 5

Events are relabeled then emitted to the SPLUNK label

<match **> @type relabel @label @SPLUNK <label @SPLUNK>

filter to remove empty lines

<filter tail.containers.**> @type grep

key log pattern ^$

Enrich log with k8s metadata

<filter tail.containers.*> @type kubernetes_metadata annotation_match [ "." ] de_dot false watch true cache_ttl 3600 <filter tail.containers.**> @type record_transformer enable_ruby

# set the sourcetype from splunk.com/sourcetype pod annotation or set it to kube:container:CONTAINER_NAME sourcetype ${record.dig("kubernetes", "annotations", "splunk.com/sourcetype") ? record.dig("kubernetes", "annotations", "splunk.com/sourcetype") : "kube:container:"+record.dig("kubernetes","container_name")} container_name ${record.dig("kubernetes","container_name")} namespace ${record.dig("kubernetes","namespace_name")} pod ${record.dig("kubernetes","pod_name")} container_id ${record.dig("docker","container_id")} pod_uid ${record.dig("kubernetes","pod_id")} container_image ${record.dig("kubernetes","container_image")} # set the cluster_name field to the configured value, or default to "cluster_name" cluster_name cluster_name # set the splunk_index field to the value found in the pod splunk.com/index annotations. if not set, use namespace annotation, or default to the default_index splunk_index ${record.dig("kubernetes", "annotations", "splunk.com/index") ? record.dig("kubernetes", "annotations", "splunk.com/index") : record.dig("kubernetes", "namespace_annotations", "splunk.com/index") ? (record["kubernetes"]["namespace_annotations"]["splunk.com/index"]) : ("k8s")} label_app ${record.dig("kubernetes","labels","app")} label_k8s-app ${record.dig("kubernetes","labels","k8s-app")} label_release ${record.dig("kubernetes","labels","release")} exclude_list ${record.dig("kubernetes", "annotations", "splunk.com/exclude") ? record.dig("kubernetes", "annotations", "splunk.com/exclude") : record.dig("kubernetes", "namespace_annotations", "splunk.com/exclude") ? (record["kubernetes"]["namespace_annotations"]["splunk.com/exclude"]) : ("false")}

Exclude all logs that are marked

@type grep
<exclude>
  key exclude_list
  pattern /^true$/
</exclude>

extract pod_uid and container_name for CRIO runtime

create source and sourcetype

= filters for non-container log files =

extract sourcetype

<filter tail.file.**> @type jq_transformer jq '.record.sourcetype = (.tag | ltrimstr("tail.file.")) | .record.cluster_name = "cluster_name" | .record.index = "k8s" | .record'

= filters for monitor agent =

@type jq_transformer jq ".record.source = \"namespace:#{ENV['MY_NAMESPACE']}/pod:#{ENV['MY_POD_NAME']}\" | .record.sourcetype = \"fluentd:monitor-agent\" | .record.cluster_name = \"cluster_name\" | .record.splunk_index = \"k8s\" | .record"

= custom filters specified by users =

= output =

<match **> @type splunk_hec protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" index_key splunk_index insecure_ssl true host "#{ENV['K8S_NODE_NAME']}" source_key source sourcetype_key sourcetype

# currently CRI does not produce log paths with all the necessary # metadata to parse out pod, namespace, container_name, container_id. # this may be resolved in the future by this issue: https://github.com/kubernetes/kubernetes/issues/58638#issuecomment-385126031 container_image pod_uid pod container_name namespace container_id cluster_name label_app label_k8s-app label_release

app_name splunk-kubernetes-logging
app_version 1.4.7
<buffer>
  @type memory
  chunk_limit_records 100000
  chunk_limit_size 20m
  flush_interval 5s
  flush_thread_count 1
  overflow_action block
  retry_max_times 5
  retry_type periodic
  total_limit_size 600m
</buffer>
<format monitor_agent>
  @type json
</format>
<format>
  # we just want to keep the raw logs, not the structure created by docker or journald
  @type single_value
  message_key log
  add_newline false
</format>

prometheus.conf:

input plugin that exports metrics

@type prometheus

@type forward

input plugin that collects metrics from MonitorAgent

@type prometheus_monitor

host ${hostname}

input plugin that collects metrics for output plugin

@type prometheus_output_monitor

host ${hostname}

source.containers.conf:

This configuration file for Fluentd / td-agent is used

to watch changes to Docker log files. The kubelet creates symlinks that

capture the pod name, namespace, container name & Docker container ID

to the docker logs for pods in the /var/log/containers directory on the host.

If running this fluentd configuration in a Docker container, the /var/log

directory should be mounted in the container.

reading kubelet logs from journal

#

Reference:

https://github.com/kubernetes/community/blob/20d2f6f5498a5668bae2aea9dcaf4875b9c06ccb/contributors/design-proposals/node/kubelet-cri-logging.md

#

Json Log Example:

{"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}

CRI Log Example (not supported):

2016-02-17T00:04:05.931087621Z stdout P { 'long': { 'json', 'object output' },

2016-02-17T00:04:05.931087621Z stdout F 'splitted': 'partial-lines' }

2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here

@id containers.log @type tail @label @CONCAT tag tail.containers. path /var/log/containers/.log pos_file /var/log/splunk-fluentd-containers.log.pos path_key source read_from_head true refresh_interval 60

@type json time_format %Y-%m-%dT%H:%M:%S.%NZ time_key time time_type string localtime false

Events:

Name: my-splunk-connect-splunk-kubernetes-metrics Namespace: default Labels: app=splunk-kubernetes-metrics app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-metrics-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

system wide configurations

log_level info

@type kubernetes_metrics tag kube.* node_name "#{ENV['NODE_NAME']}" kubelet_port 10248 use_rest_client_ssl false insecure_ssl true cluster_name cluster_name interval 15s <filter kube.**> @type record_modifier

metric_name ${tag} cluster_name cluster_name

<filter kube.node.**> @type record_modifier

source ${record['node']}

<filter kube.pod.**> @type record_modifier

source ${record['node']}/${record['pod-name']}

<filter kube.sys-container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['name']}

<filter kube.container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['container-name']}

= custom filters specified by users =

<match kube.**> @type splunk_hec data_type metric metric_name_key metric_name metric_value_key value protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" index em_metrics source ${tag} insecure_ssl true app_name splunk-kubernetes-metrics app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 10m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 400m

Events:

Name: my-splunk-connect-splunk-kubernetes-metrics-aggregator Namespace: default Labels: app=splunk-kubernetes-metrics app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-metrics-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

system wide configurations

log_level info

@type kubernetes_metrics_aggregator tag kube.* interval 15s <filter kube.**> @type record_modifier

metric_name ${tag} cluster_name cluster_name

<filter kube.cluster.**> @type record_modifier

source ${record['name']}

<filter kube.namespace.**> @type record_modifier

source ${record['name']}

<filter kube.node.**> @type record_modifier

source ${record['node']}

<filter kube.pod.**> @type record_modifier

source ${record['node']}/${record['pod-name']}

<filter kube.sys-container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['name']}

<filter kube.container.**> @type record_modifier

source ${record['node']}/${record['pod-name']}/${record['container-name']}

<match kube.**> @type splunk_hec data_type metric metric_name_key metric_name metric_value_key value protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" index em_metrics source source insecure_ssl true app_name splunk-kubernetes-metrics app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 10m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 400m

Events:

Name: my-splunk-connect-splunk-kubernetes-objects Namespace: default Labels: app=splunk-kubernetes-objects app.kubernetes.io/managed-by=Helm chart=splunk-kubernetes-objects-1.4.7 heritage=Helm release=my-splunk-connect Annotations: meta.helm.sh/release-name: my-splunk-connect meta.helm.sh/release-namespace: default

Data

fluent.conf:

log_level info

@type kubernetes_objects tag kube.objects.* api_version "v1" insecure_ssl false

resource_name pods resource_name namespaces resource_name nodes resource_name events

<filter kube.**> @type jq_transformer

in ruby '\' will escape and become just '\', since we need two '\' in the `gsub` jq filter, it becomes '\\'.

jq '.record.source = "namespace:(env.MY_NAMESPACE)/pod:(env.MY_POD_NAME)" | .record.sourcetype = (.tag | gsub("\\."; ":")) | .record'

<filter kube.**> @type jq_transformer jq '.record.cluster_name = "cluster_name" | .record'

= custom filters specified by users =

<match kube.**> @type splunk_hec protocol http hec_host "10.10.1.100" hec_port 8088 hec_token "#{ENV['SPLUNK_HEC_TOKEN']}" host "#{ENV['NODE_NAME']}" source_key source sourcetype_key sourcetype index k8s insecure_ssl true

cluster_name

app_name splunk-kubernetes-objects app_version 1.4.7

@type memory chunk_limit_records 10000 chunk_limit_size 20m flush_interval 5s flush_thread_count 1 overflow_action block retry_max_times 5 retry_type periodic total_limit_size 600m

Events:

izark1 commented 3 years ago

@luckyj5 : values.yaml

Splunk Connect for Kubernetes is a umbraller chart for three charts

* splunk-kubernetes-logging

* splunk-kubernetes-objects

* splunk-kubernetes-metrics

Use global configurations for shared configurations between sub-charts.

Supported global configurations:

Values defined here are the default values.

global: logLevel: info splunk: hec:

host is required and should be provided by user

  host: 10.10.1.100
  # port to HEC, optional, default 8088
  port: 8088
  # token is required and should be provided by user
  token: ad4df02b-d141-4297-b890-24ae31745e47
  # protocol has two options: "http" and "https", default is "https"
  protocol: http
  # indexName tells which index to use, this is optional. If it's not present, will use "main".
  indexName: k8s
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default is false.
  insecureSSL: true
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:
  # For object and metrics
  indexRouting:

kubernetes:

The cluster name used to tag logs. Default is cluster_name

clusterName: "cluster_name"

prometheus_enabled: true monitoring_agent_enabled: true

Enabling splunk-kubernetes-logging will install the `splunk-kubernetes-logging` chart to a kubernetes

cluster to collect logs generated in the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-logging: enabled: true

logLevel is to set log level of the Splunk log collector. Avaiable values are:

* trace

* debug

* info (default)

* warn

* error

logLevel:

This is can be used to exclude verbose logs including various system and Helm/Tiller related logs.

fluentd:

path of logfiles, default /var/log/containers/*.log

path: /var/log/containers/*.log
# paths of logfiles to exclude. object type is array as per fluentd specification:
# https://docs.fluentd.org/input/tail#exclude_path
exclude_path:
#  - /var/log/containers/kube-svc-redirect*.log
#  - /var/log/containers/tiller*.log
#  - /var/log/containers/*_kube-system_*.log (to exclude `kube-system` namespace)

Configurations for container logs

containers:

Path to root directory of container logs

path: /var/log
# Final volume destination of container log symlinks
pathDest: /var/lib/docker/containers
# Log format type, "json" or "cri"
logFormatType: json
# Specify the logFormat for "cri" logFormatType - provide time format
# For example "%Y-%m-%dT%H:%M:%S.%N%:z" for openshift, "%Y-%m-%dT%H:%M:%S.%NZ" for IBM IKS 
# Default for "cri": "%Y-%m-%dT%H:%M:%S.%N%:z"
logFormat:
# Specify the interval of refreshing the list of watch file.
refreshInterval:

Enriches log record with kubernetes data

k8sMetadata:

Pod labels to collect

podLabels:
  - app
  - k8s-app
  - release
watch: true
cache_ttl: 3600

sourcetypePrefix: "kube"

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true
# If you are on OpenShift and you want to run the a privileged pod
# you need to have a ClusterRoleBinding for the system:openshift:scc:privileged
# ClusterRole. Set to `true` to create the ClusterRoleBinding resource
# for the ServiceAccount.
openshiftPrivilegedSccBinding: false

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

Local splunk configurations

splunk:

Configurations for HEC (HTTP Event Collector)

hec:
  # host is required and should be provided by user
  host:
  # port to HEC, optional, default 8088
  port:
  # token is required and should be provided by user
  token:
  # protocol has two options: "http" and "https", default is "https"
  protocol:
  # indexName tells which index to use, this is optional. If it's not present, will use "main".
  indexName:
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default is false.
  insecureSSL:
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:
# Configurations for Ingest API
ingest_api:
  # serviceClientIdentifier is a string, the client identifier is used to make requests to the ingest API with authorization.
  serviceClientIdentifier:
  # serviceClientSecretKey is a string, the client identifier is used to make requests to the ingest API with authorization.
  serviceClientSecretKey:
  # tokenEndpoint is a string, it indicates which endpoint should be used to get the authorization token used to make requests to the ingest API.
  tokenEndpoint:
  # ingestAuthHost is a string, it indicates which url/hostname should be used to make token auth requests to the ingest API.
  ingestAuthHost:
  # ingestAPIHost is a string, it indicates which url/hostname should be used to make requests to the ingest API.
  ingestAPIHost:
  # tenant is a string, it indicates which tenant should be used to make requests to the ingest API.
  tenant:
  # eventsEndpoint is a string, it indicates which endpoint should be used to make requests to the ingest API.
  eventsEndpoint:
  # debugIngestAPI is a boolean, it indicates whether user wants to debug requests and responses to ingest API. Default is false.
  debugIngestAPI:

Create or use existing secret if name is empty default name is used

secret: create: true name:

Directory where to read journald logs.

journalLogPath: /run/log/journal

Set to true, to change the encoding of all strings to utf-8.

#

By default fluentd uses ASCII-8BIT encoding. If you have 2-byte chars in your logs

you need to set the encoding to UTF-8 instead.

# charEncodingUtf8: false

`logs` defines the source of logs, multiline support, and their sourcetypes.

#

The scheme to define a log is:

#

```

:

from:

timestampExtraction:

regexp: ""

format: ""

multiline:

firstline: ""

flushInterval 5

sourcetype: ""

```

#

= =

It supports 3 kinds of sources: journald, file, and container.

For `journald` logs, `unit` is required for filtering using _SYSTEMD_UNIT, example:

```

docker:

from:

journald:

unit: docker.service

```

#

For `file` logs, `path` is required for specifying where is the log files. Log files are expected in `/var/log`, example:

```

docker:

from:

file:

path: /var/log/docker.log

```

#

For `container` logs, pod name is required. You can also provide the container name, if it's not provided, the name of this source will be used as the container name:

```

kube-apiserver:

from:

pod: kube-apiserver

#

etcd:

from:

pod: etcd-server

container: etcd-container

```

#

= timestamp =

`timestampExtraction` defines how to extract timestamp from logs. This only works for `file` source.

To use `timestampExtraction` you need to define both:

- `regexp`: the Regular Expression used to find the timestamp from a log entry.

The timestamp part must be in a `time` named group. E.g.

(?\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})

- `format`: a format string defintes how to parse the timestamp, e.g. "%Y-%m-%d %H:%M:%S".

More details can be find: http://ruby-doc.org/stdlib-2.5.0/libdoc/time/rdoc/Time.html#method-c-strptime

#

= multiline =

`multiline` options provide basic multiline support. Two options:

- `firstline`: a Regular Expression used to detect the first line of a multiline log.

- `flushInterval`: The number of seconds after which the last received event log will be flushed, default value: 5s.

#

= sourcetype =

sourcetype of each kind of log can be defined using the `sourcetype` field.

If `sourcetype` is not defined, `name` will be used.

#

---

Here we have some default timestampExtraction and multiline settings for kubernetes components.

So, usually you just need to redefine the source of those components if necessary.

logs: docker: from: journald: unit: docker.service timestampExtraction: regexp: time="(?\d{4}-\d{2}-\d{2}T[0-2]\d:[0-5]\d:[0-5]\d.\d{9}Z)" format: "%Y-%m-%dT%H:%M:%S.%NZ" sourcetype: kube:docker kubelet: &glog from: journald: unit: kubelet.service timestampExtraction: regexp: \w(?[0-1]\d[0-3]\d [^\s]) format: "%m%d %H:%M:%S.%N" multiline: firstline: /^\w[0-1]\d[0-3]\d/ sourcetype: kube:kubelet etcd: from: pod: etcd-server container: etcd-container timestampExtraction: regexp: (?\d{4}-\d{2}-\d{2} [0-2]\d:[0-5]\d:[0-5]\d.\d{6}) format: "%Y-%m-%d %H:%M:%S.%N" etcd-minikube: from: pod: etcd-minikube container: etcd timestampExtraction: regexp: (?\d{4}-\d{2}-\d{2} [0-2]\d:[0-5]\d:[0-5]\d.\d{6}) format: "%Y-%m-%d %H:%M:%S.%N" etcd-events: from: pod: etcd-server-events container: etcd-container timestampExtraction: regexp: (?\d{4}-[0-1]\d-[0-3]\d [0-2]\d:[0-5]\d:[0-5]\d.\d{6}) format: "%Y-%m-%d %H:%M:%S.%N" kube-apiserver: <<: glog from: pod: kube-apiserver sourcetype: kube:kube-apiserver kube-scheduler: <<: glog from: pod: kube-scheduler sourcetype: kube:kube-scheduler kube-controller-manager: <<: glog from: pod: kube-controller-manager sourcetype: kube:kube-controller-manager kube-proxy: <<: glog from: pod: kube-proxy sourcetype: kube:kube-proxy kubedns: <<: glog from: pod: kube-dns sourcetype: kube:kubedns dnsmasq: <<: glog from: pod: kube-dns sourcetype: kube:dnsmasq dns-sidecar: <<: glog from: pod: kube-dns container: sidecar sourcetype: kube:kubedns-sidecar dns-controller: <<: glog from: pod: dns-controller sourcetype: kube:dns-controller kube-dns-autoscaler: <<: glog from: pod: kube-dns-autoscaler container: autoscaler sourcetype: kube:kube-dns-autoscaler kube-audit: from: file: path: /var/log/kube-apiserver-audit.log timestampExtraction: format: "%Y-%m-%dT%H:%M:%SZ" sourcetype: kube:apiserver-audit

Defines which version of image to use, and how it should be pulled.

image:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/fluentd-hec
# The tag of the image to pull
tag: 1.2.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Environment variable for daemonset

environmentVar:

Controls the resources used by the fluentd daemonset

resources:

limits:

#  cpu: 100m
#  memory: 200Mi
requests:
  cpu: 100m
  memory: 200Mi

Controls the output buffer for the fluentd daemonset

Note that, for memory buffer, if `resources.limits.memory` is set,

the total buffer size should not bigger than the memory limit, it should also

consider the basic memory usage by fluentd itself.

All buffer parameters (except Argument) defined in

https://docs.fluentd.org/v1.0/articles/buffer-section#parameters

can be configured here.

buffer: "@type": memory total_limit_size: 600m chunk_limit_size: 20m chunk_limit_records: 100000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

set to true to keep the structure created by docker or journald

sendAllMetadata: false

This default tolerations allow the daemonset to be deployed on master nodes,

so that we can also collect logs from those nodes.

tolerations:

key: node-role.kubernetes.io/master effect: NoSchedule

Defines which nodes should be selected to deploy the fluentd daemonset.

nodeSelector: beta.kubernetes.io/os: linux

Defines node affinity to restrict pod deployment.

affinity: {}

Defines priorityClassName to assign a priority class to pods.

priorityClassName:

= Kubernetes Connection Configs =

kubernetes:

The cluster name used to tag logs. Default is cluster_name

clusterName:

This flag specifies if the user wants to use a security context for creating the pods, which will be used to run privileged pods

securityContext: false

List of key/value pairs for metadata purpse.

Can be used to define things such as cloud_account_id, cloud_account_region, etc.

customMetadata:

- name: "cloud_account_id"

value: "1234567890"

List of annotation metadata you would like to enrich log data with

customMetadataAnnotations:

- name: custom_field

annotaion: splunk.com/custom_field

customFilters defines the custom filters to be used.

This section can be used to define custom filters using plugins like https://github.com/splunk/fluent-plugin-jq

Its also possible to use other filters like https://www.fluentd.org/plugins#filter

#

The scheme to define a custom filter is:

#

```

:

tag:

type:

body:

```

#

= fluentd tag for the filter =

This is the fluentd tag for the record

#

= fluentd filter type =

This is the fluentd filter that the user wants to use for record manipulation.

#

= definition of the fluentd filter =

This defines the body/logic for using the filter for record manipulation.

#

For example if you want to define a filter which sets cluster_name field to "my_awesome_cluster" you would the following filter

<filter tail.containers.**>

@type jq_transformer

jq '.record.cluster_name = "my_awesome_cluster" | .record'

This can be defined in the customFilters section as follows:

```

customFilters:

NamespaceSourcetypeFilter:

tag: tail.containers.**

type: jq_transformer

body: jq '.record.cluster_name = "my_awesome_cluster" | .record'

```

customFilters: {}

You can find more information on indexed fields here - https://dev.splunk.com/enterprise/docs/dataapps/httpeventcollector

The scheme to define an indexed field is:

#

```

["field_1", "field_2"]

```

#

indexFields defines the fields from the fluentd record to be indexed.

You can find more information on indexed fields here - http://dev.splunk.com/view/event-collector/SP-CAAAFB6

The input is in the form of an array(comma separated list) of the values you want to use as indexed fields.

#

For example if you want to define indexed fields for "field_1" and "field_2"

you will have to define an indexFields section as follows in values.yaml file.

```

indexFields: ["field_1", "field_2"]

```

WARNING: The fields being used here must be available inside the fluentd record.

indexFields: []

Enabling splunk-kubernetes-objects will install the `splunk-kubernetes-objects` chart to a kubernetes

cluster to collect kubernetes objects in the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-objects: enabled: true

logLevel is to set log level of the object collector. Avaiable values are:

* trace

* debug

* info (default)

* warn

* error

logLevel:

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

= Kubernetes Connection Configs =

kubernetes:

the URL for calling kubernetes API, by default it will be read from the environment variables

url:
# if insecureSSL is set to true, insecure HTTPS API call is allowed, default false
insecureSSL: false
# Path to the certificate file for this client.
clientCert:
# Path to the private key file for this client.
clientKey:
# Path to the CA file.
caFile:
# Path to the file contains the API token. By default it reads from the file "token" in the `secret_dir`.
bearerTokenFile:
# Path of the location where pod's service account's credentials are stored. Usually you don't need to care about this config, the default value should work in most cases.
secretDir:
# The cluster name used to tag cluster metrics from the aggregator. Default is cluster_name
clusterName:
# Add privileged access to containers for openshift compatibility
openshift: false

= Object Lists =

NOTE: at least one object must be provided.

#

== Schema ==

```

objects:

:

-

```

#

Each `objectDefinition` has the following fields:

* mode:

define in which way it collects this type of object, either "poll" or "watch".

- "poll" mode will read all objects of this type use the list API at an interval.

- "watch" mode will setup a long connection using the watch API to just get updates.

* name: [REQUIRED]

name of the object, e.g. `pods`, `namespaces`.

Note that for resource names that contains multiple words, like `daemonsets`,

words need to be separated with `_`, so `daemonsets` becomes `daemon_sets`.

* namespace:

only collects objects from the specified namespace, by default it's all namespaces

* labelSelector:

select objects by label(s)

* fieldSelector:

select objects by field(s)

* interval:

the interval at which object is pulled, default 15 minutes.

Only useful for "poll" mode.

#

== Example ==

```

objects:

core:

v1:

- name: pods

namespace: default

mode: pull

interval: 60m

- name: events

mode: watch

apps:

v1:

- name: daemon_sets

labelSelector: environment=production

```

objects: core: v1:

name: pods
name: namespaces
name: nodes
name: events mode: watch

= Checkpoint Configs =

defines the checkpoint file used for saving the resourceVersion of watched objects,

so that when the fluentd pod restarts, it will continue from where it stopped.

NOTE: since kubernetes has its own cache limit, if the fluentd pod has stopped for a long time,

it might not be able to start watch from the checkpoint.

checkpointFile:

the name of the checkpoint file.

name: kubernetes-objects.pos

volume is a kubernetes volume configuration object. The volume has to be a directory, not a file.

If volume is not defined, no checkpoint file will be used.

For example, if you want to use hostpath, the it should look like this:

#

checkpointFile:

volume:

hostPath:

path: /var/data

type: Directory

volume:

= Configure Splunk HEC connection =

splunk: hec:

host to the HEC endpoint [REQUIRED]

host:

token for the HEC [REQUIRED]

token:

protocol has two options: "http" and "https". Default value: "https"

protocol:

indexName tells which index to use, this is optional. If it's not present, will use "main".

indexName:

insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default value: false.

insecureSSL:

The content of a PEM-format CA certificate for this client.

clientCert:

The content of the private key for this client.

clientKey:

The content of a PEM-format CA certificate.

caFile:

The path to a directory containing CA certificates which are in PEM format.

caPath:

indexRouting is a boolean, it indicates whether user wants to route logs to the index with name specified by the user using customFilters. Default is false.

If you want to use this feature you will have to set the index key for the record using customFilters.

For example,

customFilters:

SetIndexFilter:

tag: tail.containers.**

type: jq_transformer

body: jq '.record.index = "my_awesome_index" | .record'

indexRouting:

Create or use existing secret if name is empty default name is used

secret: create: true name:

Defines which version of image to use, and how it should be pulled.

image:

The domain of the registry to pull the image from

registry: docker.io

The name of the image to pull

name: splunk/kube-objects

The tag of the image to pull

tag: 1.1.5

The policy that specifies when the user wants the images to be pulled

pullPolicy: IfNotPresent

Indicates if the image should be pulled using authentication from a secret

usePullSecret: false

The name of the pull secret to attach to the respective serviceaccount used to pull the image

pullSecretName:

Environment variable for metrics daemonset

environmentVar:

= Resoruce Limitation Configs =

resources:

limits:

cpu: 100m

memory: 200Mi

requests: cpu: 100m memory: 200Mi

Controls the output buffer for the fluentd daemonset

Note that, for memory buffer, if resources.limits.memory is set,

the total buffer size should not bigger than the memory limit, it should also

consider the basic memory usage by fluentd itself.

All buffer parameters (except Argument) defined in

https://docs.fluentd.org/v1.0/articles/buffer-section#parameters

can be configured here.

buffer: "@type": memory total_limit_size: 600m chunk_limit_size: 20m chunk_limit_records: 10000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

Defines which nodes should be selected to deploy the fluentd daemonset.

nodeSelector: beta.kubernetes.io/os: linux

This default tolerations allow the daemonset to be deployed on master nodes,

so that we can also collect metrics from those nodes.

tolerations: []

Defines node affinity to restrict pod deployment.

affinity: {}

customFilters defines the custom filters to be used.

This section can be used to define custom filters using plugins like https://github.com/splunk/fluent-plugin-jq

Its also possible to use other filters like https://www.fluentd.org/plugins#filter

#

The scheme to define a custom filter is:

#

```

:

tag:

type:

body:

```

#

= fluentd tag for the filter =

This is the fluentd tag for the record

#

= fluentd filter type =

This is the fluentd filter that the user wants to use for record manipulation.

#

= definition of the fluentd filter =

This defines the body/logic for using the filter for record manipulation.

#

For example if you want to define a filter which sets cluster_name field to "my_awesome_cluster" you would the following filter

<filter tail.containers.**>

@type jq_transformer

jq '.record.cluster_name = "my_awesome_cluster" | .record'

This can be defined in the customFilters section as follows:

```

customFilters:

NamespaceSourcetypeFilter:

tag: tail.containers.**

type: jq_transformer

body: jq '.record.cluster_name = "my_awesome_cluster" | .record'

```

customFilters: {} #

You can find more information on indexed fields here - http://dev.splunk.com/view/event-collector/SP-CAAAFB6

The scheme to define an indexed field is:

#

```

["field_1", "field_2"]

```

#

indexFields defines the fields from the fluentd record to be indexed.

You can find more information on indexed fields here - http://dev.splunk.com/view/event-collector/SP-CAAAFB6

The input is in the form of an array(comma separated list) of the values you want to use as indexed fields.

#

For example if you want to define indexed fields for "field_1" and "field_2"

you will have to define an indexFields section as follows in values.yaml file.

```

indexFields: ["field_1", "field_2"]

```

WARNING: The fields being used here must be available inside the fluentd record.

indexFields: []

Enabling splunk-kubernetes-metrics will install the `splunk-kubernetes-metrics` chart to a kubernetes

cluster to collect metrics of the cluster to a Splunk indexer/indexer cluster.

splunk-kubernetes-metrics: enabled: true

logLevel is to set log level of the Splunk kubernetes metrics collector. Avaiable values are:

* debug

* info (default)

* warn

* error

logLevel:

rbac:

Specifies whether RBAC resources should be created.

# This should be set to `false` if either:
# a) RBAC is not enabled in the cluster, or
# b) you want to create RBAC resources by yourself.
create: true

serviceAccount:

Specifies whether a ServiceAccount should be created

create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
# This flag specifies if the user wants to use a secret for creating the serviceAccount,
# which will be used to get the images from a private registry
usePullSecrets: false

podSecurityPolicy:

Specifies whether Pod Security Policy resources should be created.

# This should be set to `false` if either:
# a) Pod Security Policies is not enabled in the cluster, or
# b) you want to create Pod Security Policy resources by yourself.
create: false
# Specifies whether AppArmor profile should be applied.
# if set to true, this will add two annotations to PodSecurityPolicy:
# apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
# apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
# set to false if AppArmor is not available
apparmor_security: true
# apiGroup can be set to "extensions" for Kubernetes < 1.10.
apiGroup: policy

= Splunk HEC Connection =

splunk:

Configurations for HEC (HTTP Event Collector)

hec:
  # hostname/ip of HEC, REQUIRED.
  host:
  # port to HEC, OPTIONAL. Default value: 8088
  port:
  # the HEC token, REQUIRED.
  token:
  # protocol has two options: "http" and "https". Default value: "https"
  protocol:
  # indexName tells which index to use, OPTIONAL. If it's not present, will use "main".
  indexName: em_metrics
  # insecureSSL is a boolean, it indicates should it allow insecure SSL connection (when protocol is "https"). Default value: false
  insecureSSL:
  # The PEM-format CA certificate for this client.
  # NOTE: The content of the certificate itself should be used here, not the file path.
  #       The certificate will be stored as a secret in kubernetes.
  clientCert:
  # The private key for this client.
  # NOTE: The content of the key itself should be used here, not the file path.
  #       The key will be stored as a secret in kubernetes.
  clientKey:
  # The PEM-format CA certificate file.
  # NOTE: The content of the file itself should be used here, not the file path.
  #       The file will be stored as a secret in kubernetes.
  caFile:

Create or use existing secret if name is empty default name is used

secret: create: true name:

Defines which version of image to use, and how it should be pulled.

image:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/k8s-metrics
# The tag of the image to pull
tag: 1.1.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Defines which version of image to use, and how it should be pulled.

imageAgg:

The domain of the registry to pull the image from

registry: docker.io
# The name of the image to pull
name: splunk/k8s-metrics-aggr
# The tag of the image to pull
tag: 1.1.5
# The policy that specifies when the user wants the images to be pulled
pullPolicy: IfNotPresent
# Indicates if the image should be pulled using authentication from a secret
usePullSecret: false
# The name of the pull secret to attach to the respective serviceaccount used to pull the image
pullsecretName:

Environment variable for metrics daemonset

environmentVar:

Environment variable for metrics aggregator pod

environmentVarAgg:

Controls the resources used by the fluentd daemonset

resources: fluent: limits: cpu: 200m memory: 300Mi requests: cpu: 200m memory: 300Mi

Controls the output buffer for fluentd for the metrics pod

Note that, for memory buffer, if `resources.sidecar.limits.memory` is set,

the total buffer size should not bigger than the memory limit, it should also

consider the basic memory usage by fluentd itself.

All buffer parameters (except Argument) defined in

https://docs.fluentd.org/v1.0/articles/buffer-section#parameters

can be configured here.

buffer: "@type": memory total_limit_size: 400m chunk_limit_size: 10m chunk_limit_records: 10000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

Controls the output buffer for fluentd for the metrics aggregator pod

aggregatorBuffer: "@type": memory total_limit_size: 400m chunk_limit_size: 10m chunk_limit_records: 10000 flush_interval: 5s flush_thread_count: 1 overflow_action: block retry_max_times: 5 retry_type: periodic

Configure how often SCK pulls metrics for its kubenetes sources. 15s is the default where 's' is seconds.

metricsInterval: 15s

Defines which nodes should be selected to deploy the fluentd daemonset.

nodeSelector: beta.kubernetes.io/os: linux

This default tolerations allow the daemonset to be deployed on master nodes,

so that we can also collect metrics from those nodes.

tolerations:

key: node-role.kubernetes.io/master effect: NoSchedule

Tolerations for the aggregator pod. We do not really want this running on the master nodes, so we leave this

blank by default.

aggregatorTolerations: {}

Defines priorityClassName to assign a priority class to pods.

priorityClassName:

Defines node affinity to restrict pod deployment.

affinity: {}

= Kubernetes Connection Configs =

kubernetes:

The hostname or IP address that kubelet will use to connect to. If not supplied, status.hostIP of the node is used to fetch metrics from the Kubelet API (via the $KUBERNETES_NODE_IP environment variable).

Default is "#{ENV['KUBERNETES_NODE_IP']}"

kubeletAddress:

The port that kubelet is listening on. Default is 10250

kubeletPort: 10248

The port that is used to get the metrics using apiserver proxy using ssl for the metrics aggregator

kubeletPortAggregator:

This option is used to get the metrics from summary api on each kubelet using ssl

useRestClientSSL: false

if insecureSSL is set to true, insecure HTTPS API call is allowed, default false

insecureSSL: true

Path to the CA file.

caFile:

Path to the file contains the API token. By default it reads from the file "token" in the secret_dir.

bearerTokenFile:

Path of the location where pod's service account's credentials are stored. Usually you don't need to care about this config, the default value should work in most cases.

secretDir:

The cluster name used to tag cluster metrics from the aggregator. Default is cluster_name

clusterName:

Add privileged access to containers for openshift compatibility

openshift: false

customFilters defines the custom filters to be used.

This section can be used to define custom filters using plugins like https://github.com/splunk/fluent-plugin-jq

Its also possible to use other filters like https://www.fluentd.org/plugins#filter

#

The scheme to define a custom filter is:

#

```

:

tag:

type:

body:

```

#

= fluentd tag for the filter =

This is the fluentd tag for the record

#

= fluentd filter type =

This is the fluentd filter that the user wants to use for record manipulation.

#

= definition of the fluentd filter =

This defines the body/logic for using the filter for record manipulation.

#

For example if you want to define a filter which sets cluster_name field to "my_awesome_cluster" you would the following filter

<filter tail.containers.**>

@type jq_transformer

jq '.record.cluster_name = "my_awesome_cluster" | .record'

This can be defined in the customFilters section as follows:

```

customFilters:

NamespaceSourcetypeFilter:

tag: tail.containers.**

type: jq_transformer

body: jq '.record.cluster_name = "my_awesome_cluster" | .record'

```

customFilters: {}

izark1 commented 3 years ago

Hi @luckyj5 ,

commands output attached: get_cm.txt describe_cm.txt

values.yaml: note: renamed the extension to .txt my_values.yaml.txt

izark1 commented 3 years ago

I want to add here, that even direct call to the endpoint results in Error 404 not found.

[root@docker1 k8s]# curl http://10.10.1.80:10248/stats/summary 404 page not found [root@docker1 k8s]#

izark1 commented 3 years ago

kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:30:03Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}

rockb1017 commented 3 years ago

Hello, it seems you need to enable the metrics endpoint. Could you try steps in this thread and let us know? https://github.com/splunk/splunk-connect-for-kubernetes/issues/505#issuecomment-754705688

izark1 commented 3 years ago

Hi @rockb1017 , the case is the --enable-cadvisor-json-endpoints=true parameter can't be added to the /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf file nor in /var/lib/kubelet/kubeadm-flags.env file.

once add this parameter, so the complete ARG is Environment="KUBELET_KUBECONFIG_ARGS=--enable-cadvisor-json-endpoints=true --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" and attempt to do systemctl restart kubelet.service it immediatly fails..

Do I add it correctly?!

rockb1017 commented 3 years ago

what version of kubelet are you using?

rockb1017 commented 3 years ago

if you are using version that this option is removed, it won't work for that version. our metrics collector assumes this endpoint to collect metrics.

izark1 commented 3 years ago

[root@docker1 k8s]# kubelet --version Kubernetes v1.21.0

izark1 commented 3 years ago

So if it's a version compatibility issue, then what is max supported version of Kubernetes for SCK?

rockb1017 commented 3 years ago

the option is only available up to 1.20 https://v1-20.docs.kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

izark1 commented 3 years ago

Thanks all for the clarifying the reason.

splunk / fluent-plugin-kubernetes-metrics

error_class=RestClient::NotFound error="404 Not Found" #86

Data

ca.crt:

Data

source.files.conf:

This fluentd conf file contains sources for log files other than container logs.

source.journald.conf:

This fluentd conf file contains configurations for reading logs from systemd journal.

system.conf:

system wide configurations

fluent.conf:

@include system.conf @include source.containers.conf @include source.files.conf @include source.journald.conf @include monit.conf @include output.conf @include prometheus.conf monit.conf:

@id fluentd-monitor-agent @type monitor_agent @label @SPLUNK tag monitor_agent output.conf:

Events are emitted to the CONCAT label from the container, file and journald sources for multiline processing.

= filters for container logs =

= filters for journald logs =

Events are relabeled then emitted to the SPLUNK label

filter to remove empty lines

Enrich log with k8s metadata

Exclude all logs that are marked

extract pod_uid and container_name for CRIO runtime

create source and sourcetype

= filters for non-container log files =

extract sourcetype

= filters for monitor agent =

= custom filters specified by users =

= output =

prometheus.conf:

input plugin that exports metrics

input plugin that collects metrics from MonitorAgent

input plugin that collects metrics for output plugin

source.containers.conf:

This configuration file for Fluentd / td-agent is used

to watch changes to Docker log files. The kubelet creates symlinks that

capture the pod name, namespace, container name & Docker container ID

to the docker logs for pods in the /var/log/containers directory on the host.

If running this fluentd configuration in a Docker container, the /var/log

directory should be mounted in the container.

reading kubelet logs from journal

Reference:

https://github.com/kubernetes/community/blob/20d2f6f5498a5668bae2aea9dcaf4875b9c06ccb/contributors/design-proposals/node/kubelet-cri-logging.md

Json Log Example:

{"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}

CRI Log Example (not supported):

2016-02-17T00:04:05.931087621Z stdout P { 'long': { 'json', 'object output' },

2016-02-17T00:04:05.931087621Z stdout F 'splitted': 'partial-lines' }

2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here

Data

fluent.conf:

system wide configurations

= custom filters specified by users =

Data

fluent.conf:

system wide configurations

Data

fluent.conf:

in ruby '\' will escape and become just '\', since we need two '\' in the gsub jq filter, it becomes '\\'.

= custom filters specified by users =

Splunk Connect for Kubernetes is a umbraller chart for three charts

* splunk-kubernetes-logging

* splunk-kubernetes-objects

* splunk-kubernetes-metrics

Use global configurations for shared configurations between sub-charts.

Supported global configurations:

Values defined here are the default values.

host is required and should be provided by user

The cluster name used to tag logs. Default is cluster_name

Enabling splunk-kubernetes-logging will install the splunk-kubernetes-logging chart to a kubernetes

cluster to collect logs generated in the cluster to a Splunk indexer/indexer cluster.

logLevel is to set log level of the Splunk log collector. Avaiable values are:

* trace

* debug

* info (default)

* warn

* error

This is can be used to exclude verbose logs including various system and Helm/Tiller related logs.

path of logfiles, default /var/log/containers/*.log

Configurations for container logs

Path to root directory of container logs

in ruby '\' will escape and become just '\', since we need two '\' in the `gsub` jq filter, it becomes '\\'.

Enabling splunk-kubernetes-logging will install the `splunk-kubernetes-logging` chart to a kubernetes

`logs` defines the source of logs, multiline support, and their sourcetypes.

For `journald` logs, `unit` is required for filtering using _SYSTEMD_UNIT, example:

For `file` logs, `path` is required for specifying where is the log files. Log files are expected in `/var/log`, example:

For `container` logs, pod name is required. You can also provide the container name, if it's not provided, the name of this source will be used as the container name:

`timestampExtraction` defines how to extract timestamp from logs. This only works for `file` source.

To use `timestampExtraction` you need to define both:

- `regexp`: the Regular Expression used to find the timestamp from a log entry.

The timestamp part must be in a `time` named group. E.g.

- `format`: a format string defintes how to parse the timestamp, e.g. "%Y-%m-%d %H:%M:%S".

`multiline` options provide basic multiline support. Two options:

- `firstline`: a Regular Expression used to detect the first line of a multiline log.

- `flushInterval`: The number of seconds after which the last received event log will be flushed, default value: 5s.

sourcetype of each kind of log can be defined using the `sourcetype` field.

If `sourcetype` is not defined, `name` will be used.