oracle / oci-cloud-controller-manager

Kubernetes Cloud Controller Manager implementation for Oracle Cloud Infrastructure
Apache License 2.0
135 stars 85 forks source link

OCI load balancer doesnt receive WorkRequest to be terminated #321

Open eaglejack85 opened 4 years ago

eaglejack85 commented 4 years ago

We create 2 oci load balancers by executing the following lines in one of our helm charts:

service:
  enabled: true
  type: LoadBalancer
  port: 443
  annotations:
    service.beta.kubernetes.io/oci-load-balancer-shape: "${wtss_shape}"
    service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: "None"
    service.beta.kubernetes.io/oci-load-balancer-backend-protocol: "HTTP"
  sslEnabled: true

and one internal load balancer by executing these lines:

service:
  enabled: true
  type: LoadBalancer
  port: 80
  annotations:
    service.beta.kubernetes.io/oci-load-balancer-internal: "true"
    service.beta.kubernetes.io/oci-load-balancer-shape: "${wtss_shape}"
    service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: "None"

Environments are provisioned with terraform 0.12.19 and the following set of terraform providers: oci 3.74.0 helm 0.10.2 kubernetes 1.9.0 tls 2.1.1

No issues with creation, all works well in both dev and production OCI tenancy. Problem arises randomly in destruction of environment in production OCI tenancy, where one of the 3 load balancers is not destroyed, causing the load balancer subnet to not being able to be destroyed:

Error: Service error:Conflict. The Subnet ocid1.subnet.oc1.iad.aaaaaaaa6ksuldcwn7i52prronlm7hc2mmt2lwxeskjgpyxkq26zxwmiuexq references the VNIC ocid1.vnic.oc1.iad.abuwcljse6jxsl3fbx5vvgtjrud2xty2bmjmu6uwpt4cd7mkklgbpp77fqwq. You must remove the reference to proceed with this operation.. http status code: 409. Opc request id: 54b539a5d9b2a538a89cb3dccd9ce78b/EFF51FBB2571BC628C5383C03DD97940/57E2CAFEFE75FAD2DBE403D52079245C

This happens randomly only in production OCI tenancy

BUG REPORT

Versions

CCM Version:

Environment:

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7" ORACLE_BUGZILLA_PRODUCT_VERSION=7.6 ORACLE_SUPPORT_PRODUCT="Oracle Linux" ORACLE_SUPPORT_PRODUCT_VERSION=7.6

What happened?

One of the OCI load balancers in production tenancy created from oci-cloud-controller randomly doesnt receive work request to be terminated by terraform destroy of the helm release

What you expected to happen?

All OCI load balancers to be consistently terminated

How to reproduce it (as minimally and precisely as possible)?

Create a terraform module to deploy helm release which creates an OCI load balancer by deploying a k8s service by setting annotation service.beta.kubernetes.io/oci-load-balancer-* and try destroying the module from terraform

Anything else we need to know?

mrunalpagnis commented 2 years ago

It looks more like a terraform issue where subnet is trying to be deleted before the LB gets deleted completely. Can you verify this in newer versions of oci-cloud-controller-manager and confirm if the issue still persists?