linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.57k stars 1.27k forks source link

Improve `linkerd viz check` to have skip prom clusterrole checks when prom controller is disabled #12889

Open deusxanima opened 1 month ago

deusxanima commented 1 month ago

What problem are you trying to solve?

When installing linkerd-viz via Helm and setting the prometheus.enabled: false flag, we should disable the linkerd viz check for the associated resources (cluster roles, bindings...etc) since there is no longer an expectation that they will be installed in the cluster. At the moment, linkerd viz check still shows a warning like so when prometheus.enabled: false is set:

inkerd viz check
linkerd-viz
-----------
...
‼ prometheus is installed and configured correctly
missing ClusterRoles: linkerd-linkerd-viz-prometheus
see https://linkerd.io/2/checks/#l5d-viz-prometheus for hints
...
‼ data plane proxy metrics are present in Prometheus
Data plane metrics not found for linkerd/linkerd-destination-xxx, default/scheduler-xxx linkerd/linkerd-identity-xxx, ingress-nginx/ingress-nginx-controller-xxx, linkerd/linkerd-identity-xxx, default/giftcard-xxx, linkerd/linkerd-destination-xxx, default/reporting-xxx, linkerd/linkerd-identity-xxx, linkerd/linkerd-proxy-injector-xxx, default/authz-xxx, linkerd/linkerd-destination-xxx, linkerd/linkerd-proxy-injector-xxx linkerd/linkerd-proxy-injector-xxx...

How should the problem be solved?

update linkerd-viz check

Any alternatives you've considered?

n/a

How would users interact with this feature?

linkerd viz check

Would you like to work on this feature?

None

deusxanima commented 2 weeks ago

Confirmed that when users encountered the issue documented above, they were setting the promURL via Helm (not linkerd CLI)

adleong commented 2 weeks ago

I was not able to reproduce this issue. Here are the steps I used:

Install linkerd-viz with helm, disabling prometheus and supplying a placeholder prometheusURL:

helm install linkerd-viz linkerd/linkerd-viz -n linkerd-viz --create-namespace --set prometheus.enabled=false --set prometheusUrl=linkerd.io

Run linkerd viz check with the stable-2.14.10 CLI:

linkerd viz check
linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * metrics-api-6f469f588-fz2rx (edge-24.8.1)
        * tap-6c6d6b97c8-wpvd8 (edge-24.8.1)
        * web-5c86d6c4c7-49g7x (edge-24.8.1)
        * tap-injector-59df9b6b4c-gm4pm (edge-24.8.1)
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    metrics-api-6f469f588-fz2rx running edge-24.8.1 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints
× viz extension self-check
    Error calling Prometheus from the control plane: Query failed: "max(process_start_time_seconds{}) by (pod, namespace)": Post "linkerd.io/api/v1/query": unsupported protocol scheme ""
    see https://linkerd.io/2.14/checks/#l5d-viz-metrics-api for hints

Status check results are ×

The "prometheus is installed and configured correctly" check, which normally is after "viz extension proxies and cli versions match" and before "viz extension self-check" is skipped as intended because prometheus is disabled. The "viz extension self-check" fails as expected because linkerd.io is not a valid prometheus instance.

adleong commented 2 weeks ago

The --set prometheusUrl=linkerd.io Helm value will set an annotation on the linkerd-viz namespace:

kubectl get ns/linkerd-viz -ojsonpath='{.metadata.annotations}'
{"viz.linkerd.io/external-prometheus":"linkerd.io"}% 

It is this annotation which controls if the "prometheus is installed and configured correctly" check is skipped. I suggest verifying that the namespace annotation is set.