replicatedhq / troubleshoot

Preflight Checks and Support Bundles Framework for Kubernetes Applications
https://troubleshoot.sh
Apache License 2.0
539 stars 92 forks source link

feat: New host collector and analyzer for Kernel Configs #1546

Closed nvanthao closed 1 month ago

nvanthao commented 2 months ago

Description, Motivation and Context

This PR introduces new host collector+analyzer Kernel Configs The collector will collect kernel configurations and the analyzer can check existence of kernel config key and value.

This will help with pre-flight checks for software like k0s and Embedded Cluster https://docs.k0sproject.io/stable/external-runtime-deps/#linux-kernel-configuration

One thing to highlight in this change is that for this collector, all pass outcomes are evaluated, and the outcome will be set to fail if when condition does not match.

Sample spec

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: kernel-configs
spec:
  hostCollectors:
    - kernelConfigs: {}
  hostAnalyzers:
    - kernelConfigs:
        collectorName: "Kernel Configs Check"
        strict: true
        outcomes:
          - pass:
              when: "CONFIG_CGROUP_FREEZER=y"
              message: "The cgroup freezer is enabled"
          - pass:
              when: "CONFIG_NETFILTER_XTABLES=m"
              message: "There is netfilter xtable module"

Checklist

Does this PR introduce a breaking change?

banjoh commented 2 months ago

The collector implementation looks good

nvanthao commented 2 months ago

One thing to highlight in this change is that for this collector, all pass outcomes are evaluated, and the outcome will be set to fail if when condition does not match.

This approach will confuse people. It differs from how outcomes have been designed where when one of the outcomes evaluates to true, evaluation of outcomes for that analyser stop.

Other analysers follow an approach where inputs to evaluate against are passed in as separate parameters. Here is an example of how a kernel config analyser might look. Note that the list of kernel config values are passed in as selectedKernelConfigs input value, not defining each in an outcome.

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
spec:
  hostCollectors:
    - kernelConfigs: {}
  hostAnalyzers:
    - kernelConfigs:
        collectorName: "Kernel Configs Check"
        strict: true
        selectedKernelConfigs:
          - CONFIG_CGROUP_FREEZER=y
          - CONFIG_NETFILTER_XTABLES=m
        outcomes:
          - pass:
              when: "true"
              message: "Selected kernel configs are present"
          - fail:
              message: "{{ .ConfigsNotFound }} kernel configs are missing"

There are exceptions such as velero and ceph analysers where the outcomes are hardcoded in the analyser itself. This is acceptable because users do not define outcomes.

That's a great idea on selectedKernelConfigs, let me update the code + doc and update us

nvanthao commented 1 month ago

Hi @banjoh ,

I have made the requested changes, please let me know how it goes

Sample spec

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: kernel-configs
spec:
  hostCollectors:
    - kernelConfigs: {}
  hostAnalyzers:
    - kernelConfigs:
        collectorName: "Gerard Kernel Configs Test"
        strict: true
        selectedConfigs:
        - CONFIG_CGROUP_FREEZER=y
        - CONFIG_NETFILTER_XTABLES=m
        outcomes:
          - pass:
              message: "all kernel configs are available"
          - fail:
              message: "missing kernel config(s): {{ .ConfigsNotFound }}"