stackabletech / stackable-cockpit

Home of stackable-cockpit, stackablectl and stackable-cockpitd
https://docs.stackable.tech/management/stable/
Other
8 stars 3 forks source link

GVK resolution fails if metrics-server is unavailable #335

Open nightkr opened 6 days ago

nightkr commented 6 days ago

Affected version

stackablectl 24.7.1

Current and expected behavior

  1. Run kubectl -n kube-system delete pods -l k8s-app=metrics-server && stackablectl release install dev
  2. Observe that stackablectl crashes with the following error message:
  ERROR  failed with status 503 Service Unavailable
    at src/client/builder.rs:199

   WARN  Unsuccessful data error parse: service unavailable

    at src/client/mod.rs:467

An unrecoverable error occured: failed to execute release (sub)command

Caused by these errors (recent errors listed first):
 1: failed to create Kubernetes client
 2: failed to run GVK discovery
 3: ApiError: "service unavailable\n": Failed to parse error data (ErrorResponse { status: "503 Service Unavailable", message: "\"service unavailable\\n\"", reason: "Failed to parse error data", code: 503 })
 4: "service unavailable\n": Failed to parse error data
  1. Observe that kubectl and k9s are able to manage the cluster just fine

I suspect that this comes down to metrics-server using K8s API aggregation, which allows it to provide a fake "resource" that is stored by itself rather than in etcd. This also means that that resource can be unavailable even if the apiserver and etcd are both doing fine.

Possible solution

We could either:

  1. Limit GVK resolution to the apigroups we care about
  2. Defer apigroup-specific resolution errors until accessing the relevant apigroup

Additional context

No response

Environment

Using k3s v1.31.0+k3s1 via k3d

Would you like to work on fixing this bug?

None