replicatedhq / troubleshoot

Preflight Checks and Support Bundles Framework for Kubernetes Applications
https://troubleshoot.sh
Apache License 2.0
545 stars 93 forks source link

Collector and analyser for kurl specs in a cluster #788

Closed banjoh closed 2 months ago

banjoh commented 2 years ago

Describe the rationale for the suggested feature.

Often we might want to know the spec used to create a kURL cluster. For our enterprise customers, this can be deduced from the vendor's information but there are times a cluster being setup is from some other spec.

Describe the feature

Collect the spec from the cluster using the k8s client. From kubectl this can be read using kubectl get installer -oyaml

Describe alternatives you've considered

None

Additional context

We need to handle cases where the cluster being queried is not a kURL cluster and record the same. Noting that a cluster was not provisioned with kURL, albeit rare, is important information when supporting enterprise customers.

xavpaice commented 2 years ago

See https://app.shortcut.com/replicated/story/58400/create-troubleshoot-collector-and-analyzer-for-kurl-crd.

Description from that Shortcut ticket:

Problem to solve (Required)

When writing preflight checks and/or ascertaining complex add-on compatibility, the existing Troubleshoot collectors and analyzers do not allow us to accomplish this in a clean way.

Proposal

Instead of solely relying on the `exclude` property of a check and/or [hacking](https://github.com/replicatedhq/kURL/blob/41fb21c603f17fdee1930db7b4beb6d2bee3c3c4/pkg/preflight/assets/host-preflights.yaml#L404-L410) the `hostOS` analyzer to assess what is installed we could instead develop a kURL collector and analyzer that will provide the correct abstraction for analyzing the kURL spec. E.g. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: kurl-builtin spec: collectors: - kurlSpec: {} analyzers: - kurlSpec: outcomes: - pass: when: "kubernetes >= 1.22.12" message: "Kubernetes 1.22.12+ is supported" - pass: when: "kubernetes 1.23 -1.25" message: "kubernetes version 1.23 through 1.25 is supported" - fail: when: "kubernetes < 1.22" message: "Kubernetes version {{ .Installer.Spec.Kubernetes.Version }} not supported" - fail: when: "{{ kurl and Installer.Spec.Weave.Version .Installer.Spec.Containerd.Version (semverCompare "1.6.0 - 1.6.4" .Installer.Spec.Containerd.Version) }}" message: "Weave is not compatible with containerd versions 1.6.0 - 1.6.4" ```
xavpaice commented 2 years ago

kURL spec is indeed collected, and captured in cluster-resources/custom-resources/installers.cluster.kurl.sh/default.yaml.

Next step is to write an analyzer that can know which file to look at, and which semver comparisons to do.

Need to decide if this is a specific kURL spec analyzer, or a more generic one that can capture semvers from yaml and/or json, and make comparisons.

banjoh commented 1 year ago

I wonder whether the cluster resources analyser would be sufficient here

banjoh commented 1 year ago

Building on @xavpaice's proposal, I wonder if we can introduce filters similar to Jinja2 or jq. How this would be valuable is addressing a case like querying the version of a resource where a spec would look like below. I'm using the cluster resources analyser here.

apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: kurl-builtin
spec:
  analyzers:
    - clusterResource:
        checkName: ecko image version
        kind: Deployment
        namespace: kurl
        name: ekc-operator
        yamlPath: "spec.template.spec.containers.[0].image"
        outcomes:
          - fail:
              when: "regex('replicated/ekco:(.*?)') | semverCompare('>= v1.0.0')"
              message: The ekco deployment is not the latest version
          - pass:
              when: "true"
              message: the ekco deployment is up to date
    - clusterResource:
        checkName: ecko has containers
        kind: Deployment
        yamlPath: "spec.template.spec.containers"
        outcomes:
          - pass:
              when: "jsonpath($.length()) > 0"
              message: The ekco deployment is not the latest version

This is perhaps an overkill but what this allows us to do is reduce the need of developing & maintaining an analyser for every new use case we get.