upbound / up

The @upbound CLI
Apache License 2.0
49 stars 42 forks source link

Observability: `up spaces init` fails to apply the telemetry collector #548

Closed lsviben closed 3 weeks ago

lsviben commented 1 month ago

What happened?

up space init with observability enabled checks if the telemetry operator is installed as a prerequisite, and if not, it installs it before applying the spaces helm chart. To check if the telemetry-operator is installed, it checks for the CRD of OpentelemetryCollector.

The spaces chart contains the Space level OpenTelmetryCollector resource.

In the time when the CRD is ready, and when the Spaces chart is applied, the OpenTelmetry operator is not actually Running yet, and its mutating webhook for OpenTelemetryCollector fails, resulting in the OpenTelemetryCollector not being applied, and the up space init erroring out.

 INFO  Setting defaults for vanilla Kubernetes (type kind)
 WARNING  One or more required prerequisites are not installed:

❌ cert-manager
❌ universal-crossplane
❌ ingress-nginx
❌ provider-kubernetes
❌ provider-helm
❌ opentelemetry-operator

Would you like to install them now? [y/N]: Yes

  ✓   [1/6]: Installing cert-manager
  ✓   [2/6]: Installing universal-crossplane
  ✓   [3/6]: Installing ingress-nginx
  ✓   [4/6]: Installing provider-kubernetes
  ✓   [5/6]: Installing provider-helm
  ✓   [6/6]: Installing opentelemetry-operator
 INFO  Required prerequisites met!
 INFO  Proceeding with Upbound Spaces installation...
  ✓   [1/3]: Creating pull secret upbound-pull-secret
 ▄ [2/3]: Initializing Space components (6s)

up: error: space.initCmd.Run(): 1 error occurred:
            * Internal error occurred: failed calling webhook "mopentelemetrycollector.kb.io": failed to call webhook: Post "https://opentelemetry-operator-webhook.opentelemetry-operator.svc:443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=10s": dial tcp 10.96.142.160:443: connect: connection refused

Epic #https://github.com/upbound/spaces/issues/934

How can we reproduce it?

Local env, create kind cluster and run:

 up space init --token-file="../key.json" "v1.4.0-rc.0.331.g417a41db" \
  --set "account=bob" \
  --set "features.alpha.observability.enabled=true"

What environment did it happen in?

running locally, but I think it will happen to all fresh Spaces installations, or the ones just enabling Observability