Closed edisonwang closed 5 years ago
Thank you for the output of helm ls! That's very helpful
We will investigate this. As you and I discussed, we are not sure this is an issue that has anything to do with Workflow itself, so there may or may not be any change required in the controller and workflow-cli repos. One possible resolution would be to add a flag that you can pass into workflow-cli to let it know that it needs to clean up a finalizer, like: deis apps:destroy -a <appname> -f kubernetes
to clean up the kubernetes finalizer as you've shown here.
Perhaps better would be to find out why this seems to be a new issue, and get it fixed upstream. From our research, we found that it seems the finalizer is doing exactly what it's designed for:
Finalizers allow controllers to implement asynchronous pre-delete hooks. Custom objects support finalizers just like built-in objects.
You can add a finalizer to a custom object like this:
apiVersion: "stable.example.com/v1" kind: CronTab metadata: finalizers: - finalizer.stable.example.com
Finalizers are arbitrary string values, that when present ensure that a hard delete of a resource is not possible while they exist.
So presumably some process has left this behind, when it should be cleaned up automatically. Maybe there is something else going on. I would hesitate before I advise you to just go ahead and delete the finalizer with kubectl or like as with proposed deis apps:destroy -a <appname> -f kubernetes
.
Thanks again for the report and documentation!
Finally figured out what’s going on(sort of), I've just noticed there was an error in my metrics-server pod (showed CrashLoopBackOff) due to an incomplete deploy yaml ( same issue described here https://github.com/kubernetes-incubator/metrics-server/issues/105 ) and somehow this caused the finalizer problem…. After I fixed this all the stucking namespaces just gone…. And I tried to create new and destroy all went pretty smooth….
So it looks like metrics-server is installed on most 1.8+ clusters by default
https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/#metrics-server
I am just going to keep this open until it's clearer to me how this is reproduced (and what upstreams need to change so new users won't hit this issue anymore)
After reading through some more of the reports, it looks like this was the bad diff: https://github.com/kubernetes-incubator/metrics-server/commit/a823af80d438d642c29e038ca5336004b2a8b97e#diff-241930222cfd9fbea1ea3654fbabff5b
and it's appears that it's being tracked in kubernetes-incubator/metrics-server#97 (although it does not look like it was fixed yet)
This turns out to be an issue with metrics-server rather than Hephy.
Having this issue from right beginning of using deis, and find a related issue in following link: https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-408599873
It seems caused by this snippet in the namespace configuration, but not sure who and how it end up there.
The cluster has Rook.io and following helm chart installed: