operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
783 stars 211 forks source link

JUnit Extension: Namespaces controlled by an operator gets stuck in Terminating state on Test Error #893

Closed andreaTP closed 2 years ago

andreaTP commented 2 years ago

Bug Report

Often when a Controller fails(e.g. during tests) leaves orphan finalizers and the relevant Namespace cannot be removed.

What did you do?

Running operators and having failing test runs.

What did you expect to see?

Resources are correctly Garbage Collected by Kubernetes

What did you see instead? Under which circumstances?

Namespaces are in Terminating state forever.

Environment

Kubernetes cluster type: minikube

$ Mention java-operator-sdk version from pom.xml file 2.0.3-SNAPSHOT

$ java -version

openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8)
OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:35:25Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

I do believe that the best solution would be to default to NO_FINALIZERS, but, since this would require additional discussions we can initially document the behaviour and the reasons for it to be in place as default, along with a workaround to actually remove such namespaces, such as:

export NAMESPACE_STUCK=<your stuck namespace name>
kubectl get namespace $NAMESPACE_STUCK -o json \
  | jq -r '.spec.finalizers = null' \
  | kubectl replace --raw "/api/v1/namespaces/${NAMESPACE_STUCK}/finalize" -f -

(command line adapted from: https://stackoverflow.com/a/59667608 )

Sadly enough finalizers in the namespace spec can't be removed with simple kubectl patch or similar, and the only way is to force the resource using the raw API.

scrocquesel commented 2 years ago

That can be an issue and I tend to use a k8s mock server implementation like https://github.com/fabric8io/kubernetes-client#mocking-kubernetes. I'm not in favor of having to deal with a real server in a CI/CD pipeline. Your test run can be killed at any moment and you have to deal with messy things left over. If one really want to, it should use some sort of virtualization (could be a minikube or a virtual one on top the real one like (https://github.com/loft-sh/vcluster)). And the creation/cleanup of this temporary environnement should be done outside the test runner. That way, cleaning can be trigger independently if needed. ie. you can delayed the deletion, if the test failed to be able to diagnose the state of it.

It is very rare I have to raw patch a CR on real cluster but yes that's a pitty it doesn't work out of the box. You can use some kubectl plugin like https://github.com/xcoulon/kubectl-terminate

IMHO, turning off NO_FINALIZERS will be of no help when you will test the clean up part.

Regarding default value, maybe the default value should be NO_FINALIZERS if the default implementation of cleanup is not overriden (not sure if it's possible to detect this). Or have the cleanup method in its own interface to make cleanup support more explicit and easier to build the controler configuration by reflection.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 14 days with no activity.