oracle / cluster-api-provider-oci

Kubernetes Cluster API Provider for Oracle Cloud Infrastructure
https://oracle.github.io/cluster-api-provider-oci/
Apache License 2.0
40 stars 22 forks source link

Handle nil pointer in instanceProvision failure to continue deletion #339

Closed sindhusri16 closed 1 year ago

sindhusri16 commented 1 year ago

What happened: We upgraded capioci to v0.11.2, and created some nodepools on existing clusters. There were some provisionFailures we suppose, not sure but we wanted to delete the whole cluster, which was stuck in deleting phase because of these nodepools. In the backend, the instance was running when we issued the delete command. Even though it shows here as instanceProvisionFailed, in the console we were able to see those machines in 'running' state. There could have been some internal issue that caused the provision failure, but when we were trying to delete the cluster we came across this log with some nil pointer: `{"stream":"stderr","message":"{\"ts\":1697616351953.0674,\"caller\":\"controller/controller.go:329\",\"msg\":\"Reconciler error\",\"controller\":\"ocimachine\",\"controllerGroup\":\"infrastructure.cluster.x-k8s.io\",\"controllerKind\":\"OCIMachine\",\"OCIMachine\":

{\"name\":\"5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf\",\"namespace\":\"oke\"} ,\"namespace\":\"oke\",\"name\":\"5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf\",\"reconcileID\":\"7ac776ee-e8e6-473a-b731-b6dc625d7858\",\"err\":\"error deleting instance 5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf: can not marshal to path in request for field InstanceId. Due to can not marshal a nil pointer\",\"errVerbose\":\"can not marshal to path in request for field InstanceId. Due to can not marshal a nil pointer nerror deleting instance 5e22de10fc6a4da6b24f5d1a5e5c11c7-hmljf\ngithub.com/oracle/cluster-api-provider-oci/controllers.(OCIMachineReconciler).reconcileDelete\n\t/workspace/controllers/ocimachine_controller.go:391\ngithub.com/oracle/cluster-api-provider-oci/controllers.(OCIMachineReconciler).Reconcile\n\t/workspace/controllers/ocimachine_controller.go:152\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\"}","pod":"capoci-controller-manager-9659bd598-hpcp9","container":"manager","image":"253.255.0.31:5000/pca/cluster-api-oci-controller:v0.11.2"}` What you expected to happen: Cluster deletion should succeed

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

clusterctl describe cluster 4727b18bc0884a88bbdef686e405176d  -n oke --show-conditions Machine
NAME                                                                                 READY  SEVERITY  REASON                   SINCE  MESSAGE
!! DELETED !! Cluster/4727b18bc0884a88bbdef686e405176d                               True                                      10d
¿¿ClusterInfrastructure - OCICluster/4727b18bc0884a88bbdef686e405176d                True                                      10d
¿¿ControlPlane - KubeadmControlPlane/4727b18bc0884a88bbdef686e405176d-control-plane  True                                      10d
¿ ¿¿3 Machines...                                                                    True                                      10d    See 4727b18bc0884a88bbdef686e405176d-control-plane-7tn2f, 4727b18bc0884a88bbdef686e405176d-control-plane-mctrr, ...
¿¿Workers
  ¿¿Other
    ¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-d9dkb   False  Error     InstanceProvisionFailed  4d15h
    ¿             ¿¿BootstrapReady                                                   True                                      4d15h
    ¿             ¿¿HealthCheckSucceeded                                             False  Warning   NodeStartupTimeout       4d15h  Node failed to report startup in 10m0s
    ¿             ¿¿InfrastructureReady                                              False  Error     InstanceProvisionFailed  4d15h
    ¿             ¿¿NodeHealthy                                                      False  Info      Deleting                 4d15h
    ¿             ¿¿OwnerRemediated                                                  False  Warning   WaitingForRemediation    4d15h
    ¿             ¿¿PreTerminateDeleteHookSucceeded                                  True                                      4d15h
    ¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-kf9t7   False  Error     InstanceProvisionFailed  4d16h
    ¿             ¿¿BootstrapReady                                                   True                                      4d16h
    ¿             ¿¿HealthCheckSucceeded                                             False  Warning   NodeStartupTimeout       4d16h  Node failed to report startup in 10m0s
    ¿             ¿¿InfrastructureReady                                              False  Error     InstanceProvisionFailed  4d16h
    ¿             ¿¿NodeHealthy                                                      False  Info      Deleting                 4d16h
    ¿             ¿¿OwnerRemediated                                                  False  Warning   WaitingForRemediation    4d16h
    ¿             ¿¿PreTerminateDeleteHookSucceeded                                  True                                      4d16h
    ¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-kjn2w   False  Error     InstanceProvisionFailed  4d15h
    ¿             ¿¿BootstrapReady                                                   True                                      4d15h
    ¿             ¿¿HealthCheckSucceeded                                             False  Warning   NodeStartupTimeout       4d15h  Node failed to report startup in 10m0s
    ¿             ¿¿InfrastructureReady                                              False  Error     InstanceProvisionFailed  4d15h
    ¿             ¿¿NodeHealthy                                                      False  Info      Deleting                 4d15h
    ¿             ¿¿OwnerRemediated                                                  False  Warning   WaitingForRemediation    4d15h
    ¿             ¿¿PreTerminateDeleteHookSucceeded                                  True                                      4d15h
    ¿¿!! DELETED !! Machine/5e22de10fc6a4da6b24f5d1a5e5c11c7-67787c94fx4qbkk-nfxwk   False  Error     InstanceProvisionFailed  4d15h
                  ¿¿BootstrapReady                                                   True                                      4d15h
                  ¿¿HealthCheckSucceeded                                             False  Warning   NodeStartupTimeout       4d15h  Node failed to report startup in 10m0s
                  ¿¿InfrastructureReady                                              False  Error     InstanceProvisionFailed  4d15h
                  ¿¿NodeHealthy                                                      False  Info      Deleting                 4d15h
                  ¿¿OwnerRemediated                                                  False  Warning   WaitingForRemediation    4d15h
                  ¿¿PreTerminateDeleteHookSucceeded                                  True                                      4d15h

Environment: