replicatedhq / troubleshoot

Preflight Checks and Support Bundles Framework for Kubernetes Applications
https://troubleshoot.sh
Apache License 2.0
543 stars 92 forks source link

feat: node metrics analyser #1520

Closed banjoh closed 4 months ago

banjoh commented 5 months ago

Description, Motivation and Context

Node metrics analyser which can be used to analyse PVC usage stats like below. Analysis of other stats in the collected can be added on a need to have basis.

apiVersion: troubleshoot.sh/v1beta2
kind: Analyzer
spec:
  analyzers:
    - nodeMetrics:
        checkName: Check minio pvc space usage is less than 80%
        filters:
          pvc:
            nameRegex: "minio-data-ha-minio.*"
            namespace: "minio"
        outcomes:
          - fail:
              when: "pvcUsedPercentage >= 80"
              message: "There are PVCs using more than 80% of storage: {{ .ConcatenatedPVCNames}}"
          - pass:
              message: "No PVCs are using more than 80% of storage"

Running example

$ analyze support-bundle-2024-04-01T11_02_48.tar.gz --analyzers spec.yaml -v=1
Fail: Check minio pvc space usage is less than 80%
 There are PVCs using more than 80% of storage: minio/minio-data-ha-minio-5, minio/minio-data-ha-minio-1

Fixes: https://github.com/replicatedhq/troubleshoot/issues/1496

Checklist

Does this PR introduce a breaking change?