splunk / splunk-connect-for-kubernetes

Helm charts associated with kubernetes plug-ins
Apache License 2.0
344 stars 270 forks source link

the pod_uid field has not only pod_uid but also namespace and pod_name #830

Closed Kiyoshi-Miyake closed 1 year ago

Kiyoshi-Miyake commented 1 year ago

What happened: I found the pod_uid field has not only pod_uid but also namespace and pod_name when I upgrade SCK of latest version.

What you expected to happen: The pod_uid should be only pod_uid without any other strings.

How to reproduce it (as minimally and precisely as possible): Just use SCK latest version on Openshift based on Kubernetes 1.14 or later.

Anything else we need to know?:

I found the extraction of pod_uid in the configMap.yaml

jq '.record | . + (.source | capture("/var/log/pods/(?<pod_uid>[^/]+)/(?<container_name>[^/]+)/(?<container_retry>[0-9]+).log")) ... I found the article too.

It said the pod log directory changed after k8s 1.14 or later to below. /var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/<num>.log

Therefor, the configMap.yaml should be changed as below, I think. correct?

jq '.record | . + (.source | capture("/var/log/pods/(?<namespace>[^/_]+)_(?<pod_name>[^/_]+)_(?<pod_uid>[^/]+)/(?<container_name>[^/]+)/(?<container_retry>[0-9]+).log")) ...

Thanks,

Environment:

Kiyoshi-Miyake commented 1 year ago

I found better regex.

jq '.record | . + (.source | capture("/var/log/pods/((?<namespace>[^/_]+)_(?<pod_name>[^/_]+)_)?(?<pod_uid>[^/]+)/(?<container_name>[^/]+)/(?<container_retry>[0-9]+).log")) ...

Does this work for both of prior 1.14 and 1.14 later?

hvaghani221 commented 1 year ago

Hey @Kiyoshi-Miyake, I am not able to reproduce the issue. Can you specify how you have configured your cluster and what k8s version are you using?

Kiyoshi-Miyake commented 1 year ago

Hi @harshit-splunk, k8s version: 1.24.6

a part of configuration:

      # extract pod_uid and container_name for CRIO runtime
      <filter tail.containers.var.log.pods.**>
        @type jq_transformer
        jq '.record | . + (.source | capture("/var/log/pods/(?<pod_uid>[^/]+)/(?<container_name>[^/]+)/(?<container_retry>[0-9]+).log")) | .sourcetype = ("xxxxx:container:" + .container_name) | .splunk_index = "openshift-kub-logs"'
      </filter>

      <source>
      @id containers.log
      @type tail
      @label @CONCAT
      tag tail.containers.*
      path /var/log/pods/*/*/*.log
      pos_file /var/log/splunk-fluentd-containers.log.pos
      path_key source
      read_from_head true
      refresh_interval 10m
      <parse>
        @type regexp
        expression /^(?<time>[^\s]+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
        time_format  %Y-%m-%dT%H:%M:%S.%N%:z
        time_key time
        time_type string
        localtime false
      </parse>
    </source>      

I use SCK 1.5.1 now. I used 1.4.7 before. I upgrade SCK and change configuration due to fix another isseu.

I changed path from "/var/log/containers/*.log" to "/var/log/pods/*/*/*.log", then i encountered this issue because the filter for tail.containers.var.log.pods.** match for my Pods. so, I got good pod_uid before the change, but i get bad pod_uid now, i think.

Thanks,

hvaghani221 commented 1 year ago

Hi @Kiyoshi-Miyake, you should not update path unless you are absolutely sure about your needs. Fluentd doesn't have any issues when reading from sym-linked files as long as the actual file path is accessible. That is why we have added pathDest config. When using docker runtime, both /var/log/container and /var/log/pods resolves to /var/lib/docker/containers. So, it is mounted along with /var/log directory.

Closing the issue as it is not a bug along with the PR.