Closed akandimalla closed 1 year ago
Hi @akandimalla, can you please share pod logs and values.yaml config file?
Hi @harshit-splunk - Here are the details:
global:
logLevel: info
splunk:
hec:
host: splunk.xxxx.com
port: 443
token: xxxxx
protocol: https
endpoint:
fullUrl:
indexName:
insecureSSL: true
clientCert:
clientKey:
caFile:
indexRouting:
consume_chunk_on_4xx_errors:
kubernetes:
clusterName: xxxxxx
prometheus_enabled:
monitoring_agent_enabled:
monitoring_agent_index_name:
metrics:
service:
enabled: true
headless: true
serviceMonitor:
enabled: false
metricsPort: 24231
interval: ""
scrapeTimeout: "10s"
additionalLabels: { }
splunk-kubernetes-logging:
enabled: true
logLevel:
namespace:
fluentd:
path: /var/log/containers/*.log
exclude_path:
containers:
path: /var/log
pathDest: /var/lib/docker/containers
logFormatType: json
logFormat:
refreshInterval:
removeBlankEvents: true
localTime: false
k8sMetadata:
podLabels:
- app
- k8s-app
- release
watch: true
cache_ttl: 3600
sourcetypePrefix: "kube"
rbac:
create: true
openshiftPrivilegedSccBinding: false
serviceAccount:
create: true
name:
podSecurityPolicy:
create: false
apparmor_security: true
apiGroup: policy
splunk:
hec:
host:
port:
token:
protocol:
endpoint:
fullUrl:
indexName:
insecureSSL:
clientCert:
clientKey:
caFile:
consume_chunk_on_4xx_errors:
gzip_compression:
ingest_api:
serviceClientIdentifier:
serviceClientSecretKey:
tokenEndpoint:
ingestAuthHost:
ingestAPIHost:
tenant:
eventsEndpoint:
debugIngestAPI:
secret:
create: true
name:
journalLogPath: /run/log/journal
charEncodingUtf8: false
logs:
docker:
from:
journald:
unit: docker.service
sourcetype: kube:docker
kubelet: &glog
from:
journald:
unit: kubelet.service
multiline:
firstline: /^\w[0-1]\d[0-3]\d/
sourcetype: kube:kubelet
etcd:
from:
pod: etcd-server
container: etcd-container
etcd-minikube:
from:
pod: etcd-minikube
container: etcd
etcd-events:
from:
pod: etcd-server-events
container: etcd-container
kube-apiserver:
<<: *glog
from:
pod: kube-apiserver
sourcetype: kube:kube-apiserver
kube-scheduler:
<<: *glog
from:
pod: kube-scheduler
sourcetype: kube:kube-scheduler
kube-controller-manager:
<<: *glog
from:
pod: kube-controller-manager
sourcetype: kube:kube-controller-manager
kube-proxy:
<<: *glog
from:
pod: kube-proxy
sourcetype: kube:kube-proxy
kubedns:
<<: *glog
from:
pod: kube-dns
sourcetype: kube:kubedns
dnsmasq:
<<: *glog
from:
pod: kube-dns
sourcetype: kube:dnsmasq
dns-sidecar:
<<: *glog
from:
pod: kube-dns
container: sidecar
sourcetype: kube:kubedns-sidecar
dns-controller:
<<: *glog
from:
pod: dns-controller
sourcetype: kube:dns-controller
kube-dns-autoscaler:
<<: *glog
from:
pod: kube-dns-autoscaler
container: autoscaler
sourcetype: kube:kube-dns-autoscaler
kube-audit:
from:
file:
path: /var/log/kube-apiserver-audit.log
timestampExtraction:
format: "%Y-%m-%dT%H:%M:%SZ"
sourcetype: kube:apiserver-audit
image:
registry: docker.io
name: splunk/fluentd-hec
tag: 1.3.0
pullPolicy: IfNotPresent
usePullSecret: false
pullSecretName:
environmentVar:
podAnnotations:
extraLabels:
resources:
requests:
cpu: 100m
memory: 200Mi
bufferChunkKeys:
- index
buffer:
"@type": memory
total_limit_size: 600m
chunk_limit_size: 20m
chunk_limit_records: 100000
flush_interval: 5s
flush_thread_count: 1
overflow_action: block
retry_max_times: 10
retry_type: periodic
retry_wait: 30
sendAllMetadata: false
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
nodeSelector:
kubernetes.io/os: linux
affinity: {}
extraVolumes: []
extraVolumeMounts: []
priorityClassName:
kubernetes:
clusterName:
securityContext: false
customMetadata:
customMetadataAnnotations:
customFilters: {}
indexFields: []
rollingUpdate:
Logs: Even though they show errors, I checked and confirm I gave the right annotation to the name spaces.
2022-10-06 12:47:06 +0000 [info]: #0 stats - namespace_cache_size: 5, pod_cache_size: 33, pod_cache_watch_updates: 253, pod_cache_host_updates: 105, pod_cache_watch_ignored: 58, pod_cache_watch_delete_ignored: 56, namespace_cache_api_updates: 89, pod_cache_api_updates: 89, id_cache_miss: 89, pod_watch_gone_errors: 5, pod_watch_gone_notices: 5 2022-10-06 12:47:07 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:07 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:31 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}
Can you check error logs from _internal
index? It will show at which index the collector is trying to send index to.
Also, have you modified any of the template files?
Can you help what is meant by _internal
index? Should i check with my Splunk admin about the index error logs at their end?
No changes were made to the template files. Using the default ones.
Should I check with my Splunk admin about the index error logs at their end?
Yes, Splunk will internally log in _internal index when data is sent to an invalid index.
The error:
2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}
I've encountered with these 2 situations:
Unfortunately, the error message doesn't contain the name of the index that fails.
The error:
2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}
I've encountered with these 2 situations:
- misspelled index name in the splunk.com/index annotation (namespace or pod)
- missing permissions on the index for the HEC token
Unfortunately, the error message doesn't contain the name of the index that fails.
Checked with splink admin and they don't see any issues on their end. I checked my cluster and confirm annotations are correct on the namespaces.
The error message logged by the splunk logging pod is received from the Splunk server. So your SCK is connecting to the splunk server successfully, but the server rejects your logs. If you are sure, the annotations are correct, there's just one last option: the HEC token you use doesn't have permissions for the index.
@akandimalla any updates on this?
Closing due to inactivity. Feel free to reopen the issue.
Sorry for the delayed response. This ticket is good to close.
What happened: We have an index called "logs-nonprod". We have an application named "app" and it's deployed in three different EKS clusters dev,qa,stg on three namespaces called app-dev,app-qa,app-stg. All the namespaces are annotated to the index logs-nonprod on each cluster.
dev: k annotate --overwrite ns app-dev splunk.com/index=logs-nonprod qa: k annotate --overwrite ns app-qa splunk.com/index=logs-nonprod stg: k annotate --overwrite ns app-stg splunk.com/index=logs-nonprod
ONLY logs from qa eks cluster fail to send logs to the index from the last 24 hrs. It is sending logs to "eks-default". The rest of the dev and stg is sending fine without any issues. For the qa logs, we are seeing them in the eks-default index.
What you expected to happen: Logs send to splunk index How to reproduce it (as minimally and precisely as possible): NA Anything else we need to know?: NA Environment:
kubectl version
):ruby --version
):cat /etc/os-release
):NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"