replicatedhq / troubleshoot

Preflight Checks and Support Bundles Framework for Kubernetes Applications
https://troubleshoot.sh
Apache License 2.0
545 stars 93 forks source link

Run additional collectors based on collector result #691

Open xavpaice opened 2 years ago

xavpaice commented 2 years ago

Describe the rationale for the suggested feature.

Sometimes when troubleshooting an issue, the result of one collector can indicate that another one needs to be run based on the output of the first. As an example, if a ceph health detailreturns output for a number of PGs, it would be helpful to have the result of ceph pg $pg query for each $pg in the list of problem pgs.

e.g. output in health.json:

{
    "checks": {
        "PG_AVAILABILITY": {
            "severity": "HEALTH_WARN",
            "summary": {
                "message": "Reduced data availability: 213 pgs inactive"
            },
            "detail": [
                {
                    "message": "pg 2.14 is stuck inactive for 1658.491904, current state unknown, last acting []"
                },

This would indicate I'd like to collect info for pg 2.14 (etc).

Describe the feature

Adding a collector that is fed info from another collector would be helpful - e.g. take the output saved to ceph/health.json, and parse it. Based on that result, run an additional collector.

This request is to create the framework to allow collectors that follow this pattern, rather than the specific Ceph collector (though that would be a good first example).

Describe alternatives you've considered

Additional context

spron-in commented 1 year ago

Same applies to analyzers.

A quick example here would be checking k8s versions.

At Percona we test Operators on various k8s flavors and versions.

So if it is GKE, I want to check for versions 1.20 to 1.23. EKS - 1.20 - 1.22 only (no 1.23 here).