replicatedhq / troubleshoot

Preflight Checks and Support Bundles Framework for Kubernetes Applications
https://troubleshoot.sh
Apache License 2.0
544 stars 93 forks source link

feat: cgroups host collector #1581

Closed banjoh closed 1 month ago

banjoh commented 1 month ago

Description, Motivation and Context

Linux control groups host collector that detects whether the specified mountPoint is a cgroup filesystem and what version it is. The collector also collects information of the configured cgroup controllers. Since the output is a JSON object, the JSON compare analyser can be utilised. (https://github.com/replicatedhq/troubleshoot/pull/1582 PR is WIP to allow host collectors to use this analyzer)

Collector spec

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: cgroups
spec:
  hostCollectors:
    - cgroups:
         mountPoint: /cgroup/mount # default to /sys/fs/cgroup if not defined 

cgroup v1 results

{
  "cgroup-enabled": true,
  "cgroup-v1": {
    "enabled": true,
    "mountPoint": "/sys/fs/cgroup",
    "controllers": [
      "cpuset",
      "cpu",
      "cpuacct",
      "blkio",
      "memory",
      "devices",
      "freezer",
      "net_cls",
      "perf_event",
      "net_prio",
      "hugetlb",
      "pids",
      "rdma",
      "misc"
    ]
  },
  "cgroup-v2": {
    "enabled": false,
    "mounts": null,
    "controllers": null
  },
  "allControllers": [
    "cpuset",
    "cpu",
    "cpuacct",
    "blkio",
    "memory",
    "devices",
    "freezer",
    "net_cls",
    "perf_event",
    "net_prio",
    "hugetlb",
    "pids",
    "rdma",
    "misc"
  ]
}

cgroup v2 results

{
  "cgroup-enabled": true,
  "cgroup-v1": {
    "enabled": false,
    "mountPoint": "",
    "controllers": []
  },
  "cgroup-v2": {
    "enabled": true,
    "mountPoint": "/sys/fs/cgroup",
    "controllers": [
      "cpuset",
      "cpu",
      "io",
      "memory",
      "hugetlb",
      "pids",
      "rdma",
      "misc",
      "freezer",
      "devices"
    ]
  },
  "allControllers": [
    "cpu",
    "cpuset",
    "devices",
    "freezer",
    "hugetlb",
    "io",
    "memory",
    "misc",
    "pids",
    "rdma"
  ]
}

cgroup v1 configuration https://asciinema.org/a/v93LLPLKxgLjUXiLWRNi0SEpp

cgroup v2 configuration https://asciinema.org/a/WGxafFNfPUL4my1sQoyxJO1uy

Fixes: https://github.com/replicatedhq/troubleshoot/issues/1579

Checklist

Does this PR introduce a breaking change?

banjoh commented 1 month ago

https://github.com/replicatedhq/troubleshoot.sh/pull/566 is the docs PR