Deleted files not released under extreme load

josephmcasey commented 2 years ago

Describe the bug

When running versions of the operator >v1.15.1 under heavy logging load deleted files are being held open. Previous versions do not have this issue. This results in rising disk consumption until the node is full. Cycling the log-router pods releases all the deleted files and the spare is cleared.

Reproduction steps

When using the default configuration (kubernetes.conf) on a cluster with 4 worker nodes (m5.4xlarge)

<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/log-router-fluentd-containers.log.pos
  pos_file_compaction_interval 1h
  tag kubernetes.*
  read_from_head true
  read_bytes_limit_per_second 8192
  <parse>
    @type multiline
    # cri-o
    format1 /^(?<partials>([^\n]+ (stdout|stderr) P [^\n]+\n)*)/
    format2 /(?<time>[^\n]+) (?<stream>stdout|stderr) F (?<log>[^\n]*)/
    # docker
    format3 /|(?<json>{.*})/
    time_format %Y-%m-%dT%H:%M:%S.%N%:z
  </parse>
</source>

Using a workload like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: logging
  labels:
    app: logging
spec:
  replicas: 15
  selector:
    matchLabels:
      app: logging
  template:
    metadata:
      annotations:
        seccomp.security.alpha.kubernetes.io/pod: runtime/default
        app: logging
      name: logging-pod
      namespace: log-router
    spec:
      containers:
      - command:
        - /app/talk.sh
        name: logging
        image: busybox
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /app/talk.sh
          name: logging
          subPath: talk.sh
          readOnly: true
      restartPolicy: Always
      securityContext:
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: logging
        configMap:
          defaultMode: 0755
          name: logging

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: logging
data:
  talk.sh: |
    #!/bin/bash
    while true
    do echo "Bread-and-butter pickles are a marinated variety of pickled cucumber in a solution of vinegar, sugar, and spices. They may simply be chilled as refrigerator pickles or canned. Their name and their broad popularity in the United States are attributed to Omar and Cora Fanning, who were Illinois cucumber farmers that started selling sweet and sour pickles in the 1920s. They filed for the trademark Fannings Bread and Butter Pickles in 1923 (though the recipe and similar recipes are probably much older).[4] The story to the name is that the Fannings survived rough years by making the pickles with their surplus of undersized cucumbers and bartering them with their grocer for staples such as bread and butter.[5] Their taste is often much sweeter than other types of pickle, due to the sweeter brine they are marinated in, but they differ from sweet pickles in that they are spiced with cilantro and other spices"
    done
---

Which simply outputs a large text statement repeatedly will cause escalating disk pressure.

lsof +L1 on a node shows the open files

ruby     14804 root   52r   REG  259,1 28163472     0   1387209 /var/log/pods/log-router_logging-74b8c97457-bcztq_9f18b4ab-b1a1-4bef-bdf0-3c9de80436b0/logging/0.log.20220826-152342 (deleted)
ruby     14804 root   53r   REG  259,1 31170459     0  75728095 /var/log/pods/log-router_logging-74b8c97457-nggns_379c8597-5fb8-45b1-b5ae-02e9be40c00c/logging/0.log.20220826-152342 (deleted)
ruby     14804 root   85r   REG  259,1 28497664     0 138713940 /var/log/pods/log-router_logging-74b8c97457-xkrl6_160d7108-f314-4a4e-91ab-fa81ffd9c6bb/logging/0.log.20220826-152342 (deleted)
ruby     14804 root   86r   REG  259,1 31717574     0  17874561 /var/log/pods/log-router_logging-74b8c97457-2kpfc_62a9437b-f801-4536-9a8d-ee9a9b7d5bd7/logging/0.log.20220826-152342 (deleted)
ruby     14804 root   89r   REG  259,1 31193256     0   1387205 /var/log/pods/log-router_logging-74b8c97457-bcztq_9f18b4ab-b1a1-4bef-bdf0-3c9de80436b0/logging/0.log.20220826-152352 (deleted)
ruby     14804 root   90r   REG  259,1 30854419     0  75728094 /var/log/pods/log-router_logging-74b8c97457-nggns_379c8597-5fb8-45b1-b5ae-02e9be40c00c/logging/0.log.20220826-152352 (deleted)
ruby     14804 root   91r   REG  259,1 28254180     0 138713933 /var/log/pods/log-router_logging-74b8c97457-xkrl6_160d7108-f314-4a4e-91ab-fa81ffd9c6bb/logging/0.log.20220826-152352 (deleted)
ruby     14804 root   92r   REG  259,1 30979406     0  17874560 /var/log/pods/log-router_logging-74b8c97457-2kpfc_62a9437b-f801-4536-9a8d-ee9a9b7d5bd7/logging/0.log.20220826-152352 (deleted)
ruby     14804 root   95r   REG  259,1 29548987     0   1387208 /var/log/pods/log-router_logging-74b8c97457-bcztq_9f18b4ab-b1a1-4bef-bdf0-3c9de80436b0/logging/0.log.20220826-152402 (deleted)
ruby     14804 root   96r   REG  259,1 31676635     0  75739169 /var/log/pods/log-router_logging-74b8c97457-nggns_379c8597-5fb8-45b1-b5ae-02e9be40c00c/logging/0.log.20220826-152402 (deleted)
ruby     14804 root   97r   REG  259,1 25122153     0 138713941 /var/log/pods/log-router_logging-74b8c97457-xkrl6_160d7108-f314-4a4e-91ab-fa81ffd9c6bb/logging/0.log.20220826-152402 (deleted)
ruby     14804 root   98r   REG  259,1 32618914     0  17874563 /var/log/pods/log-router_logging-74b8c97457-2kpfc_62a9437b-f801-4536-9a8d-ee9a9b7d5bd7/logging/0.log.20220826-152402 (deleted)

The logs of the fluentd container:

2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-xhk7f_log-router_logging-da1356a58211813782a659ce96074646ae09f1c6cbc8cf619c7358b9d401c5c4.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-xhk7f_log-router_logging-da1356a58211813782a659ce96074646ae09f1c6cbc8cf619c7358b9d401c5c4.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-sh2kq_log-router_logging-7e534adc876d6d2ecab99345bf48de95429f89a56a326f0808e8ad29c2663c8d.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-gwfwg_log-router_logging-37816e709f0d3c74d9ad60b5ab66491306ea472d027d486b9463016fa279478f.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-sh2kq_log-router_logging-7e534adc876d6d2ecab99345bf48de95429f89a56a326f0808e8ad29c2663c8d.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-xhk7f_log-router_logging-da1356a58211813782a659ce96074646ae09f1c6cbc8cf619c7358b9d401c5c4.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-xhk7f_log-router_logging-da1356a58211813782a659ce96074646ae09f1c6cbc8cf619c7358b9d401c5c4.log; waiting 5 seconds
2022-08-26 15:25:36 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logging-74b8c97457-sh2kq_log-router_logging-7e534adc876d6d2ecab99345bf48de95429f89a56a326f0808e8ad29c2663c8d.log; waiting 5 seconds

Expected behavior

The logs should be deleted released upon rotation.

Additional context

No response

josephmcasey commented 2 years ago

Workaround

We were able to resolve this by adding the following lines to the in_tail_container_logs source:

      follow_inodes true
      open_on_every_update true

We did this by projecting our own kubernetes.conf over the template in the reloader image

slimm609 commented 2 years ago

Thanks for opening this issue @josephmcasey. We have actually seen this one ourselves but it's an interesting issue and has some compliance and security implications for changing how it operates today. Not sure what the previous action was but "open_on_every_update" was not added until 0.14.12 and prior to 1.15 the fluentd version was 1.12 so that may explain that part.

It would be good to add an "opt-in" flag but not make it the default. The default log rotation in kubernetes for containerd is 10Mi per file and 5 files. (https://github.com/kubernetes/kubernetes/pull/59898/files#diff-aa85fa10ff2032cc1aeeb608a71cc25ecbc85c357cc1e1d44b07ce2d46ab8555)

if this is enabled by default and you are running a default containerd config, then I only need to generate a little over 50Mi of logs in a pod to cause fluentd to start dropping logs and not sending them off. This can easily be used for an attacker to hide their tracks if it is defaulted to enabled and should not be enabled without caution and understanding that it will drop logs rather quickly under load.

The ideal solution would be to only drop logs if the volume is almost out of space and set configurable thresholds of when to start leveraging this feature instead of just having it on all the time. If there are short bursts in logs, the local disk should have space to buffer until it can send them off and then this isn't a problem. This issue really only happens when there is an extreme amount of logs for an extended amount of time, which is arguably a different issue entirely and rather an issue for the application but KFO should have some safeguards to try and prevent it from causing issues on the host.

vmware / kube-fluentd-operator