operator-framework / operator-sdk

SDK for building Kubernetes applications. Provides high level APIs, useful abstractions, and project scaffolding.
https://sdk.operatorframework.io
Apache License 2.0
7.27k stars 1.75k forks source link

I... Have a corpse. redis-operator keeps returning. #6854

Open IngwiePhoenix opened 3 weeks ago

IngwiePhoenix commented 3 weeks ago

Type of question

General operator-related help

Question

What did you do?

At some point, I installed - well, "subscribed to" - the redis-operator. But I removed it; I found using olm just to install some operators a little too cumbersome for what I do or want to do and instead switched to using Helm charts.

But uhm... how do I put it...

operators   116s (x3444 over 17h)   Warning   BackOff                           Pod/redis-operator-7d7777ccdb-6b6fb            Back-off restarting failed container manager in pod redis-operator-7d7777ccdb-6b6fb_operators(037b090f-2f91-402e-ac3a-36a294ea82fc)
operators   3m4s (x191 over 17h)    Warning   ComponentUnhealthy                ClusterServiceVersion/redis-operator.v0.15.1   installing: waiting for deployment redis-operator to become ready: deployment "redis-operator" not available: Deployment does not have minimum availability.
operators   3m3s (x369 over 17h)    Normal    NeedsReinstall                    ClusterServiceVersion/redis-operator.v0.15.1   installing: waiting for deployment redis-operator to become ready: deployment "redis-operator" not available: Deployment does not have minimum availability.
operators   3m2s (x315 over 17h)    Normal    AllRequirementsMet                ClusterServiceVersion/redis-operator.v0.15.1   all requirements found, attempting install
operators   5m35s (x378 over 17h)   Normal    InstallSucceeded                  ClusterServiceVersion/redis-operator.v0.15.1   waiting for install components to report healthy
operators   3m (x405 over 17h)      Normal    InstallWaiting                    ClusterServiceVersion/redis-operator.v0.15.1   installing: waiting for deployment redis-operator to become ready: deployment "redis-operator" not available: Deployment does not have minimum availability.
operators   5m36s (x197 over 16h)   Warning   InstallCheckFailed                ClusterServiceVersion/redis-operator.v0.15.1   install timeout

So yeah, corpse. o.o

What did you expect to see?

Once I removed the subscription - and that has worked with other operators I tried - I expected the Redis operator to disappear from my cluster.

What did you see instead? Under which circumstances?

The log output from above, since... weeks. I had to reboot my cluster a few times as I had to shrink it down (hardware damage on the other nodes, loss of quorum, reset of etcd, etc...) but this has been going on since a long, long time now - even before my other nodes went away.

Environment

Operator type:

/language go (I think?)

Kubernetes cluster type:

k3s 1.29.5, which, as far as I am aware, is vanilla.

$ operator-sdk version

operator-sdk version: "unknown", commit: "unknown", kubernetes version: "unknown", go version: "go1.21.1", GOOS: "windows", GOARCH: "amd64"

(When checking with operator-sdk.exe olm status, it requests version v0.28.0.)

$ go version (if language is Go)

# Windows client I do most of my management from
go version go1.23.2 windows/amd64
# Linux node k3s runs on; it runs Armbian, this package is from the repos.
go version go1.19.8 linux/arm64

$ kubectl version

Client Version: v1.29.5+k3s1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5+k3s1

Additional context

I stumbled over the Operator SDK and will probably learn it at some point. Right now, I mainly keep it in my cluster for TektonCI and CNPG. So, I haven't gone deep-dive on it to be completely honest... I'm just a dude that hosts things at home. =)

joelanford commented 6 days ago

This is a common problem with OLM(v0!) If you want to uninstall the operator, you need to delete more than the subscription. There is an operator API that can tell you all of the objects associated with an installed operator. I can't remember the exact naming scheme of the object, but iirc, it is something like <namespace>.<packageName>.

If you look in the status of that object it should tell you everything you need to delete.

Shameless plug: the Operator Framework maintainers have been hard at work on OLMv1, which is a complete rewrite of OLM that solves this exact problem (and many others!)

We talked about what's new last week at KubeCon! I encourage you to check this out and keep with with what we're doing in the Kubernetes #olm-dev slack channel

https://www.youtube.com/watch?v=V0NYHt2yjcM&list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&index=286