Closed rfrodrigues16 closed 5 years ago
This issue is similar to https://github.com/splunk/splunk-connect-for-kubernetes/issues/204
I think we can handle the timeout error in the HEC plugin to solve this issue. Apart from that i think it might be dns issue/splunk being unavailable for that pod. A pod restart might help. Let me know what you think.
I doesn't seem to be dns/splunk issue. Same configuration is working fine for splunk-kubernetes-logging and splunk-kubernetes-metrics. I also restarted the pod several times, it doesn't work. I didn't get the idea of handling the issue via HEC plugin. Is there are configuration which I could try?
Hi @rfrodrigues16 - Are you deploying with Helm and inside the configuration files are you setting any explicit timeout values for the bundled HEC plugin module. Also, What version of Splunk are you sending to?
Hi @dtregonning, I'm not setting any explicit timeout value. The plugin is sending data to an instance of Splunk Enterprise version 7.1.4. I'm pasting the Pod configuration below:
<match kube.**> @type splunk_hec protocol http hec_host "x.x.x.x" hec_port 8088 hec_token "x-x-x-x-x" host "" source_key "source" sourcetype_key "sourcetype" index "vecc_csa_kubernetes_logs" insecure_ssl false
<buffer>
@type "memory"
chunk_limit_records 10000
chunk_limit_size 200m
flush_interval 3s
flush_thread_count 1
overflow_action block
retry_max_times 3
total_limit_size 600m
</buffer>
Can you set the protocol
to https and ensure you are running insecureSSL=true
unless you have valid certs being served by Splunk HEC?
Your curl worked because it was:
curl -k https://x.x.x.x:8088/services/collector/event
-k means same thing as insecureSSL=true
and you are using https.
I already tried both (http and https). Firstly, I was using https and insecureSSL=true with the HEC security enabled in the Splunk Enterprise. I disabled security in the HEC configuration and changed the plugin configuration to http in order to check if the issue was somehow related to the security option. Nevertheless, I still get the same error using http.
let's not guess, let's confirm your HEC configs please.
Once you have verified which protocol HEC is serving, then we can validate your configs, which we need to see dumped from the running pod.
Currently, my HEC is configured to use HTTP. The global settings in the Splunk Enterprise are:
When you say “from the node” did you drop a pod in the splunk namespace or try from the os?
with those settings i would have http and insecureSSL=true
and I would validate the hec hostname is resolvable or use the ip.
I’m convinced its a reachability thing based on my experience.
Hi @matthewmodestino, thank you for your hints. The problem seems to be connectivity of one of the k8s nodes with the Splunk HEC.
I’m experiencing a problem when trying to use the SCK agent for collecting Kubernetes objects data (splunk-kubernetes-objects). The SCK agent throws an error “execution expired” whenever it tries to post the data using HEC. See the log excerpts below:
2019-08-13 21:47:37 +0000 [debug]: #0 Sending 4547 bytes to Splunk. 2019-08-13 21:47:37 +0000 [trace]: #0 POST https:/x.x.x.x:8088/services/collector body={"host":"","time":"1565732853.4252117","index":"vecc_csa_kubernetes_logs","source":"namespace:splunk/pod:splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf","sourcetype":"kube:objects:pods","fields":{"cluster_name":"cluster_name"},"event":"{\"metadata\":{\"name\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf\",\"generateName\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855-\",\"namespace\":\"splunk\",\"selfLink\":\"/api/v1/namespaces/splunk/pods/splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf\",\"uid\":\"f3dc7102-be13-11e9-8dde-fa163e5b4bad\",\"resourceVersion\":\"5401185\",\"creationTimestamp\":\"2019-08-13T21:47:31Z\",\"labels\":{\"app\":\"splunk-kubernetes-objects\",\"engine\":\"fluentd\",\"pod-template-hash\":\"6cb5876855\",\"release\":\"splunk-connect\"},\"annotations\":{\"checksum/config\":\"b14e319dbcdaddd52e542e9fe9b529381b0f918591820bcdfc4a10fd2b572fff\"},\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"kind\":\"ReplicaSet\",\"name\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855\",\"uid\":\"f3da97df-be13-11e9-8dde-fa163e5b4bad\",\"controller\":true,\"blockOwnerDeletion\":true}],\"requestResourceVersion\":\"5401187\"},\"spec\":{\"volumes\":[{\"name\":\"conf-configmap\",\"configMap\":{\"name\":\"splunk-connect-splunk-kubernetes-objects\",\"defaultMode\":420}},{\"name\":\"secrets\",\"secret\":{\"secretName\":\"splunk-kubernetes-objects\",\"defaultMode\":420}},{\"name\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"secret\":{\"secretName\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"defaultMode\":420}}],\"containers\":[{\"name\":\"splunk-fluentd-k8s-objects\",\"image\":\"k8s-registry.eccd.local/splunk/kube-objects:1.1.0\",\"args\":[\"-c\",\"/fluentd/etc/fluent.conf\"],\"env\":[{\"name\":\"MY_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.namespace\"}}},{\"name\":\"MY_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.name\"}}},{\"name\":\"NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"spec.nodeName\"}}},{\"name\":\"SPLUNK_HEC_TOKEN\",\"valueFrom\":{\"secretKeyRef\":{\"name\":\"splunk-kubernetes-objects\",\"key\":\"splunk_hec_token\"}}}],\"resources\":{\"requests\":{\"cpu\":\"100m\",\"memory\":\"200Mi\"}},\"volumeMounts\":[{\"name\":\"conf-configmap\",\"mountPath\":\"/fluentd/etc\"},{\"name\":\"secrets\",\"readOnly\":true,\"mountPath\":\"/fluentd/etc/splunk\"},{\"name\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"readOnly\":true,\"mountPath\":\"/var/run/secrets/kubernetes.io/serviceaccount\"}],\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\",\"imagePullPolicy\":\"IfNotPresent\"}],\"restartPolicy\":\"Always\",\"terminationGracePeriodSeconds\":30,\"dnsPolicy\":\"ClusterFirst\",\"serviceAccountName\":\"splunk-connect-splunk-kubernetes-objects\",\"serviceAccount\":\"splunk-connect-splunk-kubernetes-objects\",\"nodeName\":\"worker-pool1-xmbq8dza-csa-eccd-v2\",\"securityContext\":{},\"schedulerName\":\"default-scheduler\",\"tolerations\":[{\"key\":\"node.kubernetes.io/not-ready\",\"operator\":\"Exists\",\"effect\":\"NoExecute\",\"tolerationSeconds\":300},{\"key\":\"node.kubernetes.io/unreachable\",\"operator\":\"Exists\",\"effect\":\"NoExecute\",\"tolerationSeconds\":300}],\"priority\":0,\"enableServiceLinks\":true},\"status\":{\"phase\":\"Running\",\"conditions\":[{\"type\":\"Initialized\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:31Z\"},{\"type\":\"Ready\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:32Z\"},{\"type\":\"ContainersReady\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:32Z\"},{\"type\":\"PodScheduled\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:31Z\"}],\"hostIP\":\"10.0.10.2\",\"podIP\":\"192.168.20.23\",\"startTime\":\"2019-08-13T21:47:31Z\",\"containerStatuses\":[{\"name\":\"splunk-fluentd-k8s-objects\",\"state\":{\"running\":{\"startedAt\":\"2019-08-13T21:47:32Z\"}},\"lastState\":{},\"ready\":true,\"restartCount\":0,\"image\":\"k8s-registry.eccd.local/splunk/kube-objects:1.1.0\",\"imageID\":\"docker-pullable://k8s-registry.eccd.local/splunk/kube-objects@sha256:209ea3157615eec0bd43bcae5450c4a7e2c962ce5c8dd42e890ca81dfc551103\",\"containerID\":\"docker://4c59e1927a3d386d8707e86033c73e9a82a7e56823811572abefffdaceda5e79\"}],\"qosClass\":\"Burstable\"}}"} … 2019-08-13 21:48:37 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2019-08-13 21:48:38 +0000 chunk="59006968aad29c0ccc8b3c2c2ee28461" error_class=Net::OpenTimeout error="execution expired" 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:937:in initialize' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:937:in open' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:937:in block in connect' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/timeout.rb:103:in timeout' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:935:in connect' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:920:in do_start' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/lib/ruby/2.5.0/net/http.rb:915:in start' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:692:in start' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:622:in connection_for' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:927:in request' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:355:in send_to_hec' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:167:in write' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1125:in try_flush' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1425:in flush_thread_run' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:454:in block (2 levels) in start' 2019-08-13 21:48:37 +0000 [warn]: #0 /usr/local/bundle/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:in block in thread_create' 2019-08-13 21:48:37 +0000 [trace]: #0 enqueueing all chunks in buffer instance=69875636192300 … 2019-08-13 22:02:44 +0000 [error]: #0 failed to flush the buffer, and hit limit for retries. dropping all chunks in the buffer queue. retry_times=3 records=2 error_class=Net::OpenTimeout error="execution expired" 2019-08-13 22:02:44 +0000 [error]: #0 suppressed same stacktrace
Nevertheless, I can successfully send the same request from the node using Curl:
curl -k https://x.x.x.x:8088/services/collector/event -H "Authorization: Splunk xxx-xxx-xxx-xxx" -d '{"host":"","time":"1565732853.4252117","index":"vecc_csa_kubernetes_logs","source":"namespace:splunk/pod:splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf","sourcetype":"kube:objects:pods","fields":{"cluster_name":"cluster_name"},"event":"{\"metadata\":{\"name\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf\",\"generateName\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855-\",\"namespace\":\"splunk\",\"selfLink\":\"/api/v1/namespaces/splunk/pods/splunk-connect-splunk-kubernetes-objects-6cb5876855-rmqpf\",\"uid\":\"f3dc7102-be13-11e9-8dde-fa163e5b4bad\",\"resourceVersion\":\"5401185\",\"creationTimestamp\":\"2019-08-13T21:47:31Z\",\"labels\":{\"app\":\"splunk-kubernetes-objects\",\"engine\":\"fluentd\",\"pod-template-hash\":\"6cb5876855\",\"release\":\"splunk-connect\"},\"annotations\":{\"checksum/config\":\"b14e319dbcdaddd52e542e9fe9b529381b0f918591820bcdfc4a10fd2b572fff\"},\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"kind\":\"ReplicaSet\",\"name\":\"splunk-connect-splunk-kubernetes-objects-6cb5876855\",\"uid\":\"f3da97df-be13-11e9-8dde-fa163e5b4bad\",\"controller\":true,\"blockOwnerDeletion\":true}],\"requestResourceVersion\":\"5401187\"},\"spec\":{\"volumes\":[{\"name\":\"conf-configmap\",\"configMap\":{\"name\":\"splunk-connect-splunk-kubernetes-objects\",\"defaultMode\":420}},{\"name\":\"secrets\",\"secret\":{\"secretName\":\"splunk-kubernetes-objects\",\"defaultMode\":420}},{\"name\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"secret\":{\"secretName\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"defaultMode\":420}}],\"containers\":[{\"name\":\"splunk-fluentd-k8s-objects\",\"image\":\"k8s-registry.eccd.local/splunk/kube-objects:1.1.0\",\"args\":[\"-c\",\"/fluentd/etc/fluent.conf\"],\"env\":[{\"name\":\"MY_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.namespace\"}}},{\"name\":\"MY_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.name\"}}},{\"name\":\"NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"spec.nodeName\"}}},{\"name\":\"SPLUNK_HEC_TOKEN\",\"valueFrom\":{\"secretKeyRef\":{\"name\":\"splunk-kubernetes-objects\",\"key\":\"splunk_hec_token\"}}}],\"resources\":{\"requests\":{\"cpu\":\"100m\",\"memory\":\"200Mi\"}},\"volumeMounts\":[{\"name\":\"conf-configmap\",\"mountPath\":\"/fluentd/etc\"},{\"name\":\"secrets\",\"readOnly\":true,\"mountPath\":\"/fluentd/etc/splunk\"},{\"name\":\"splunk-connect-splunk-kubernetes-objects-token-4m9w6\",\"readOnly\":true,\"mountPath\":\"/var/run/secrets/kubernetes.io/serviceaccount\"}],\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\",\"imagePullPolicy\":\"IfNotPresent\"}],\"restartPolicy\":\"Always\",\"terminationGracePeriodSeconds\":30,\"dnsPolicy\":\"ClusterFirst\",\"serviceAccountName\":\"splunk-connect-splunk-kubernetes-objects\",\"serviceAccount\":\"splunk-connect-splunk-kubernetes-objects\",\"nodeName\":\"worker-pool1-xmbq8dza-csa-eccd-v2\",\"securityContext\":{},\"schedulerName\":\"default-scheduler\",\"tolerations\":[{\"key\":\"node.kubernetes.io/not-ready\",\"operator\":\"Exists\",\"effect\":\"NoExecute\",\"tolerationSeconds\":300},{\"key\":\"node.kubernetes.io/unreachable\",\"operator\":\"Exists\",\"effect\":\"NoExecute\",\"tolerationSeconds\":300}],\"priority\":0,\"enableServiceLinks\":true},\"status\":{\"phase\":\"Running\",\"conditions\":[{\"type\":\"Initialized\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:31Z\"},{\"type\":\"Ready\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:32Z\"},{\"type\":\"ContainersReady\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:32Z\"},{\"type\":\"PodScheduled\",\"status\":\"True\",\"lastProbeTime\":null,\"lastTransitionTime\":\"2019-08-13T21:47:31Z\"}],\"hostIP\":\"10.0.10.2\",\"podIP\":\"192.168.20.23\",\"startTime\":\"2019-08-13T21:47:31Z\",\"containerStatuses\":[{\"name\":\"splunk-fluentd-k8s-objects\",\"state\":{\"running\":{\"startedAt\":\"2019-08-13T21:47:32Z\"}},\"lastState\":{},\"ready\":true,\"restartCount\":0,\"image\":\"k8s-registry.eccd.local/splunk/kube-objects:1.1.0\",\"imageID\":\"docker-pullable://k8s-registry.eccd.local/splunk/kube-objects@sha256:209ea3157615eec0bd43bcae5450c4a7e2c962ce5c8dd42e890ca81dfc551103\",\"containerID\":\"docker://4c59e1927a3d386d8707e86033c73e9a82a7e56823811572abefffdaceda5e79\"}],\"qosClass\":\"Burstable\"}}"}'
{"text":"Success","code":0}
HEC is configured with insecureSSL=true. Also, the same HEC configuration is being used with SCK for sending logging and metrics to Splunk. The logging and metrics agents are working fine.
Do you have any idea what could be wrong?