Closed yashoza19 closed 1 year ago
/lgtm
/lgtm
Although this patch appears to work most of the time, I saw the following error..which after searching around appears to be a race condition...trying to see if i can replicate it again.
$ opcap check --audit-plan=OperatorInstall,OperandInstall --catalogsource=certified-operators --catalogsourcenamespace=openshift-marketplace --packages confluent-for-kubernetes
{"level":"error","ts":1674523424.1474535,"caller":"logger/logger.go:62","message":"CleanerName: github.com/opdev/opcap/internal/capability.operandCleanup.func1 ---- cleanup failed: Operation cannot be fulfilled on kafkas.platform.confluent.io \"kafka\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/platform.confluent.io/kafkas/opcap-confluent-for-kubernetes-ownnamespace/kafka, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 5d814f39-0030-4413-be60-2d6747b0fc11, UID in object meta: ","stacktrace":"github.com/opdev/opcap/internal/logger.Errorf\n\t/vagrant_data/opcap/internal/logger/logger.go:62\ngithub.com/opdev/opcap/internal/capability.cleanup\n\t/vagrant_data/opcap/internal/capability/auditor.go:165\ngithub.com/opdev/opcap/internal/capability.RunAudits\n\t/vagrant_data/opcap/internal/capability/auditor.go:230\ngithub.com/opdev/opcap/cmd.runAudits\n\t/vagrant_data/opcap/cmd/check.go:78\ngithub.com/opdev/opcap/cmd.checkRunE\n\t/vagrant_data/opcap/cmd/check.go:73\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/spf13/cobra.(*Command).ExecuteContext\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:961\ngithub.com/opdev/opcap/cmd.Execute\n\t/vagrant_data/opcap/cmd/root.go:44\nmain.main\n\t/vagrant_data/opcap/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
I saw it again on hazelcast-platform-operator
(contains multiple CRs):
{"level":"error","ts":1674525568.5580864,"caller":"logger/logger.go:62","message":"CleanerName: github.com/opdev/opcap/internal/capability.operandCleanup.func1 ---- cleanup failed: Operation cannot be fulfilled on cronhotbackups.hazelcast.com \"cronhotbackup-sample\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/hazelcast.com/cronhotbackups/opcap-hazelcast-platform-operator-ownnamespace/cronhotbackup-sample, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 19cab4ea-62e1-4c6c-8884-1bfa50f47988, UID in object meta: ","stacktrace":"github.com/opdev/opcap/internal/logger.Errorf\n\t/vagrant_data/opcap/internal/logger/logger.go:62\ngithub.com/opdev/opcap/internal/capability.cleanup\n\t/vagrant_data/opcap/internal/capability/auditor.go:165\ngithub.com/opdev/opcap/internal/capability.RunAudits\n\t/vagrant_data/opcap/internal/capability/auditor.go:230\ngithub.com/opdev/opcap/cmd.runAudits\n\t/vagrant_data/opcap/cmd/check.go:78\ngithub.com/opdev/opcap/cmd.checkRunE\n\t/vagrant_data/opcap/cmd/check.go:73\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/spf13/cobra.(*Command).ExecuteContext\n\t/home/vagrant/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:961\ngithub.com/opdev/opcap/cmd.Execute\n\t/vagrant_data/opcap/cmd/root.go:44\nmain.main\n\t/vagrant_data/opcap/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
I saw it again on
hazelcast-platform-operator
(contains multiple CRs):Thanks for checking @madorn. I did a complete run - and found more issues that I'm trying to resolve. Will update this PR with what I find.
I tested the hazelcast-platform-operator with this PR, looks like it is deleting all other resource except this CR. Somehow even after deleting the finalizer, it keeps adding the finalizer unless the operator itself is not deleted. Here are the controller logs:
{"level":"info","ts":1674533308.0610213,"logger":"controllers.WanReplication","msg":"Adding finalizer","name":"wanreplication-sample","namespace":"opcap-hazelcast-platform-operator-ownnamespace/wanreplication-sample"}
{"level":"info","ts":1674533308.0675478,"logger":"wanreplication-resource","msg":"validate update","name":"wanreplication-sample"}
{"level":"error","ts":1674533308.0800261,"msg":"Reconciler error","controller":"wanreplication","controllerGroup":"hazelcast.com","controllerKind":"WanReplication","WanReplication":{"name":"wanreplication-sample","namespace":"opcap-hazelcast-platform-operator-ownnamespace"},"namespace":"opcap-hazelcast-platform-operator-ownnamespace","name":"wanreplication-sample","reconcileID":"60399d53-3e82-4194-8b45-01357591cb17","error":"Map.hazelcast.com \"map-sample\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.1/pkg/internal/controller/cont...
Need to investigate further into this.
@madorn , it looks it was just a logic error. We're skipping part of the deletion because some objects were being deleted as a consequence of others. It looked like secondary resources disappearing. The code wasn't expecting that. I think now Yash has it fixed.
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: acmenezes, acornett21, mrhillsman
Associated issue: #346
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Signed-off-by: yoza yoza@redhat.com
Description of PR
Fixes #346
Changes (required)
Operand Cleanup not cleaning up due to options being recreated and not shared between operator install and operand install
Error in the get operation for api not found error returning instead of continuing to the other operand to be deleted in operand cleanup function
Cleanup function doesn't specify its name when logging errors since it is a generic function. Report the actual function name
Checklist (required)