splunk / splunk-connect-for-kubernetes

Helm charts associated with kubernetes plug-ins
Apache License 2.0
344 stars 270 forks source link

unable to forward logs to index via namespace annotation #813

Closed akandimalla closed 1 year ago

akandimalla commented 2 years ago

What happened: We have an index called "logs-nonprod". We have an application named "app" and it's deployed in three different EKS clusters dev,qa,stg on three namespaces called app-dev,app-qa,app-stg. All the namespaces are annotated to the index logs-nonprod on each cluster.

dev: k annotate --overwrite ns app-dev splunk.com/index=logs-nonprod qa: k annotate --overwrite ns app-qa splunk.com/index=logs-nonprod stg: k annotate --overwrite ns app-stg splunk.com/index=logs-nonprod

ONLY logs from qa eks cluster fail to send logs to the index from the last 24 hrs. It is sending logs to "eks-default". The rest of the dev and stg is sending fine without any issues. For the qa logs, we are seeing them in the eks-default index.

What you expected to happen: Logs send to splunk index How to reproduce it (as minimally and precisely as possible): NA Anything else we need to know?: NA Environment:

hvaghani221 commented 2 years ago

Hi @akandimalla, can you please share pod logs and values.yaml config file?

akandimalla commented 2 years ago

Hi @harshit-splunk - Here are the details:

global:
  logLevel: info
  splunk:
    hec:
      host: splunk.xxxx.com
      port: 443
      token: xxxxx
      protocol: https
      endpoint:
      fullUrl:
      indexName:
      insecureSSL: true
      clientCert:
      clientKey:
      caFile:
      indexRouting:
      consume_chunk_on_4xx_errors:
  kubernetes:
    clusterName: xxxxxx
  prometheus_enabled:
  monitoring_agent_enabled:
  monitoring_agent_index_name:
  metrics:
    service:
      enabled: true
      headless: true
  serviceMonitor:
    enabled: false

    metricsPort: 24231
    interval: ""
    scrapeTimeout: "10s"

    additionalLabels: { }

splunk-kubernetes-logging:
  enabled: true
  logLevel:

  namespace:

  fluentd:
    path: /var/log/containers/*.log
    exclude_path:

  containers:
    path: /var/log
    pathDest: /var/lib/docker/containers
    logFormatType: json
    logFormat:
    refreshInterval:
    removeBlankEvents: true
    localTime: false

  k8sMetadata:
    podLabels:
      - app
      - k8s-app
      - release
    watch: true
    cache_ttl: 3600

  sourcetypePrefix: "kube"

  rbac:
    create: true
    openshiftPrivilegedSccBinding: false

  serviceAccount:
    create: true
    name:

  podSecurityPolicy:
    create: false
    apparmor_security: true
    apiGroup: policy

  splunk:
    hec:
      host:
      port:
      token:
      protocol:
      endpoint:
      fullUrl:
      indexName:
      insecureSSL:
      clientCert:
      clientKey:
      caFile:
      consume_chunk_on_4xx_errors:
      gzip_compression:
    ingest_api:
      serviceClientIdentifier:
      serviceClientSecretKey:
      tokenEndpoint:
      ingestAuthHost:
      ingestAPIHost:
      tenant:
      eventsEndpoint:
      debugIngestAPI:

  secret:
    create: true
    name:

  journalLogPath: /run/log/journal

  charEncodingUtf8: false

  logs:
    docker:
      from:
        journald:
          unit: docker.service
      sourcetype: kube:docker
    kubelet: &glog
      from:
        journald:
          unit: kubelet.service
      multiline:
        firstline: /^\w[0-1]\d[0-3]\d/
      sourcetype: kube:kubelet
    etcd:
      from:
        pod: etcd-server
        container: etcd-container
    etcd-minikube:
      from:
        pod: etcd-minikube
        container: etcd
    etcd-events:
      from:
        pod: etcd-server-events
        container: etcd-container
    kube-apiserver:
      <<: *glog
      from:
        pod: kube-apiserver
      sourcetype: kube:kube-apiserver
    kube-scheduler:
      <<: *glog
      from:
        pod: kube-scheduler
      sourcetype: kube:kube-scheduler
    kube-controller-manager:
      <<: *glog
      from:
        pod: kube-controller-manager
      sourcetype: kube:kube-controller-manager
    kube-proxy:
      <<: *glog
      from:
        pod: kube-proxy
      sourcetype: kube:kube-proxy
    kubedns:
      <<: *glog
      from:
        pod: kube-dns
      sourcetype: kube:kubedns
    dnsmasq:
      <<: *glog
      from:
        pod: kube-dns
      sourcetype: kube:dnsmasq
    dns-sidecar:
      <<: *glog
      from:
        pod: kube-dns
        container: sidecar
      sourcetype: kube:kubedns-sidecar
    dns-controller:
      <<: *glog
      from:
        pod: dns-controller
      sourcetype: kube:dns-controller
    kube-dns-autoscaler:
      <<: *glog
      from:
        pod: kube-dns-autoscaler
        container: autoscaler
      sourcetype: kube:kube-dns-autoscaler
    kube-audit:
      from:
        file:
          path: /var/log/kube-apiserver-audit.log
      timestampExtraction:
        format: "%Y-%m-%dT%H:%M:%SZ"
      sourcetype: kube:apiserver-audit

  image:
    registry: docker.io
    name: splunk/fluentd-hec
    tag: 1.3.0
    pullPolicy: IfNotPresent
    usePullSecret: false
    pullSecretName:

  environmentVar:

  podAnnotations:

  extraLabels:

  resources:
    requests:
      cpu: 100m
      memory: 200Mi

  bufferChunkKeys:
  - index
  buffer:
    "@type": memory
    total_limit_size: 600m
    chunk_limit_size: 20m
    chunk_limit_records: 100000
    flush_interval: 5s
    flush_thread_count: 1
    overflow_action: block
    retry_max_times: 10
    retry_type: periodic
    retry_wait: 30

  sendAllMetadata: false

  tolerations:
    - key: node-role.kubernetes.io/master
      effect: NoSchedule

  nodeSelector:
    kubernetes.io/os: linux

  affinity: {}

  extraVolumes: []
  extraVolumeMounts: []

  priorityClassName:

  kubernetes:
    clusterName:
    securityContext: false

  customMetadata:

  customMetadataAnnotations:

  customFilters: {}

  indexFields: []

  rollingUpdate:

Logs: Even though they show errors, I checked and confirm I gave the right annotation to the name spaces.

2022-10-06 12:47:06 +0000 [info]: #0 stats - namespace_cache_size: 5, pod_cache_size: 33, pod_cache_watch_updates: 253, pod_cache_host_updates: 105, pod_cache_watch_ignored: 58, pod_cache_watch_delete_ignored: 56, namespace_cache_api_updates: 89, pod_cache_api_updates: 89, id_cache_miss: 89, pod_watch_gone_errors: 5, pod_watch_gone_notices: 5 2022-10-06 12:47:07 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:07 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:13 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:19 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1} 2022-10-06 12:47:31 +0000 [error]: #0 Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

hvaghani221 commented 2 years ago

Can you check error logs from _internal index? It will show at which index the collector is trying to send index to.

hvaghani221 commented 2 years ago

Also, have you modified any of the template files?

akandimalla commented 2 years ago

Can you help what is meant by _internal index? Should i check with my Splunk admin about the index error logs at their end?
No changes were made to the template files. Using the default ones.

hvaghani221 commented 2 years ago

Should I check with my Splunk admin about the index error logs at their end?

Yes, Splunk will internally log in _internal index when data is sent to an invalid index.

vinzent commented 2 years ago

The error:

2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

I've encountered with these 2 situations:

  1. misspelled index name in the splunk.com/index annotation (namespace or pod)
  2. missing permissions on the index for the HEC token

Unfortunately, the error message doesn't contain the name of the index that fails.

akandimalla commented 2 years ago

The error:

2022-10-06 12:47:25 +0000 [error]: #0 Fluent::Plugin::SplunkHecOutput: Failed POST to https://xxxxxxxx.com/services/collector, response: {"text":"Incorrect index","code":7,"invalid-event-number":1}

I've encountered with these 2 situations:

  1. misspelled index name in the splunk.com/index annotation (namespace or pod)
  2. missing permissions on the index for the HEC token

Unfortunately, the error message doesn't contain the name of the index that fails.

Checked with splink admin and they don't see any issues on their end. I checked my cluster and confirm annotations are correct on the namespaces.

vinzent commented 2 years ago

The error message logged by the splunk logging pod is received from the Splunk server. So your SCK is connecting to the splunk server successfully, but the server rejects your logs. If you are sure, the annotations are correct, there's just one last option: the HEC token you use doesn't have permissions for the index.

hvaghani221 commented 2 years ago

@akandimalla any updates on this?

hvaghani221 commented 1 year ago

Closing due to inactivity. Feel free to reopen the issue.

akandimalla commented 1 year ago

Sorry for the delayed response. This ticket is good to close.