opentelekomcloud / terraform-provider-opentelekomcloud

Terraform OpenTelekomCloud provider
https://registry.terraform.io/providers/opentelekomcloud/opentelekomcloud/latest
Mozilla Public License 2.0
85 stars 77 forks source link

CCE Cluster created unable to attach load balancer address to a LoadBalancer service type #2498

Closed levshvarts closed 4 months ago

levshvarts commented 4 months ago

Terraform provider version

$ terraform -v
Terraform v1.5.7
on linux_amd64
+ provider registry.terraform.io/opentelekomcloud/opentelekomcloud v1.36.7

Affected Resource(s)

opentelekomcloud_cce_cluster_v3

Terraform Configuration Files

resource "opentelekomcloud_cce_cluster_v3" "cluster_1" {
  name        = "test-cluster"

  cluster_type           = "VirtualMachine"
  flavor_id              = "cce.s2.small"
  vpc_id                 = var.vpc.id
  subnet_id              = var.cluster_subnet_id
  container_network_type  = "eni"
  kubernetes_svc_ip_range = "10.246.0.0/16"
  eni_subnet_id           = var.pod_subnet_id
  eni_subnet_cidr         = var.pod_subnet_cidr
  authentication_mode    = "rbac"
  kube_proxy_mode        = "iptables"

  eip                     = opentelekomcloud_vpc_eip_v1.cluster_access_eip.publicip[0].ip_address
  annotations            = { "cluster.install.addons.external/install" = "[{\"addonTemplateName\":\"icagent\",\"extendParam\":{\"logSwitch\":\"false\",\"tDSEnable\":\"true\"}}]", "cluster.install.addons/install": "[{\"addonTemplateName\":\"coredns\",\"values\":{\"flavor\":{\"is_default\":true,\"name\":2500,\"recommend_cluster_flavor_types\":[\"small\"],\"replicas\":2,\"resources\":[{\"limitsCpu\":\"500m\",\"limitsMem\":\"512Mi\",\"name\":\"coredns\",\"requestsCpu\":\"500m\",\"requestsMem\":\"512Mi\"}],\"category\":[\"CCE\",\"Turbo\"]}}},{\"addonTemplateName\":\"everest\"},{\"addonTemplateName\":\"npd\"}]" }
}

# EIP for Service Load Balancer
resource "opentelekomcloud_vpc_eip_v1" "service_load_balancer_ip" {
  publicip {
    type = "5_bgp"
    name = "test-cluster-load-balancer"
  }
  bandwidth {
    name        = "service-load-balancer"
    size        = 100
    share_type  = "PER"
  }
}

# Service Load Balancer definition - used subsequently in kubectl apply for LoadBalancer service
resource "opentelekomcloud_lb_loadbalancer_v3" "service_load_balancer" {
  name        = "test-service-lb"
  router_id   = var.vpc.id
  network_ids = [var.cluster_subnet_id]

  availability_zones = [var.availability_zone]

  public_ip {
    id = opentelekomcloud_vpc_eip_v1.service_load_balancer_ip.id
  }
}

Steps to Reproduce

  1. terraform apply
  2. Download kubeconfig
  3. Create nginx deployment by following OTC documentation
  4. Resulting service is stuck in a degraded state with the following error: Error syncing load balancer: failed to ensure load balancer: Failed to ListEips : request failed: {"error_msg":"Incorrect IAM authentication information: verify aksk signature fail, canonical_request:GET|\/v1\/d13fe6ad9bad43e29904a38b8731b121\/publicips\/|port_id=|host:vpc.eu-de.otc.t-systems.com|x-project-id:d13fe6ad9bad43e29904a38b8731b121|x-sdk-date:20240502T222345Z||host;x-project-id;x-sdk-date|e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","error_code":"APIGW.0301","request_id":"90f7714c6ee0f4a31ae34cd4a930138e"} , status code: 401

    Expected Behavior

    Service is up and running and has an assigned load balancer IP:

    $ kubectl get services
    NAME         TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)         AGE
    nginx        LoadBalancer   10.246.40.99   X.X.X.X        443:30335/TCP   7s

    Actual Behavior

    Service is stuck in pending state:

    $ kubectl get services
    NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
    ngnix-service   LoadBalancer   10.246.125.92   <pending>     443:32490/TCP   5m24s
    $ kubectl describe service ngnix-service
    Name:                     ngnix-service
    Namespace:                default
    Labels:                   <none>
    Annotations:              kubernetes.io/elb.class: performance
                          kubernetes.io/elb.health-check-flag: off
                          kubernetes.io/elb.id: xxxxx-xxxx-xxxx-xxxxxxxx
                          kubernetes.io/elb.lb-algorithm: ROUND_ROBIN
                          kubernetes.io/elb.session-affinity-mode: SOURCE_IP
                          kubernetes.io/elb.session-affinity-option: {"persistence_timeout": "30"}
    Selector:                 app=nginx
    Type:                     LoadBalancer
    IP Family Policy:         SingleStack
    IP Families:              IPv4
    IP:                       10.246.125.92
    IPs:                      10.246.125.92
    Port:                     service0  443/TCP
    TargetPort:               80/TCP
    NodePort:                 service0  32490/TCP
    Endpoints:                10.51.1.84:80
    Session Affinity:         None
    External Traffic Policy:  Cluster
    Events:
    Type     Reason                  Age    From                Message
    ----     ------                  ----   ----                -------
    Warning  SyncLoadBalancerFailed  6m11s  service-controller  Error syncing load balancer: failed to ensure load balancer: Failed to ListEips : request failed: {"error_msg":"Incorrect IAM authentication information: verify aksk signature fail, canonical_request:GET|\/v1\/d13fe6ad9bad43e29904a38b8731b121\/publicips\/|port_id=|host:vpc.eu-de.otc.t-systems.com|x-project-id:d13fe6ad9bad43e29904a38b8731b121|x-sdk-date:20240502T222330Z||host;x-project-id;x-sdk-date|e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","error_code":"APIGW.0301","request_id":"f66c1c01f96aab10690083534842f2b2"}
    , status code: 401

    Important Factoids

    The most strange thing is that I encounter this error when using kubeconfig from the CCE cluster created with terroform provider. When I add a service through a console(on the cluster created with terraform) the service gets the load-balancer associated just fine. In addition, when I create a CCE cluster through the console this issue disappears - I can use the kubeconfig generated by the platform and apply the same nginx service yaml and the ip gets assigned.

However with this problem terraform provider is somewhat useless, as we would have to create everything thorough the console anyways.

I tried this using both AK/SK and User ID + Password + TOTP type of auth for the provider.

References

levshvarts commented 4 months ago

Alright, I figured out what the issue is... Maybe keep this closed bug just in case anybody else hits this issue, as the error given by the platform is meaningless.

My issue was that the ELB that I created didn't have subnet_id specified, so it couldn't bind to an ip on the subnet. With the following change to the Service Load Balancer definition, the issue was resolved:

data "opentelekomcloud_lb_flavor_v3" "network_lb_small" {
  name = var.service_lb_flavor_name
}

resource "opentelekomcloud_lb_loadbalancer_v3" "service_load_balancer" {
  name        = "test-service-lb"
  router_id   = var.vpc.id
  subnet_id = var.cluster_subnet_id
  network_ids = [var.cluster_subnet_id]
  l4_flavor   = data.opentelekomcloud_lb_flavor_v3.network_lb_small.id

  availability_zones = [var.availability_zone]

  public_ip {
    id = opentelekomcloud_vpc_eip_v1.service_load_balancer_ip.id
  }
}

In reality the way provider allowed me to create the load balancer is wrong and it seems like subnet_id should be required, as this is required on the OTC UI.

Also note, that if you want to assign this ELB to the LoadBalancer service type in CCE, you have to specify l4_flavor, which is also not obvious.

With the above changes I can simply apply my k8s manifest to the cluster and see it run:

$ kubectl apply -f nginx.yml
deployment.apps/nginx created
service/ngnix-service created
$ kubectl get services
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                  PORT(S)         AGE
ngnix-service   LoadBalancer   10.246.230.99   10.50.143.169,xx.xx.xx.x7   443:31092/TCP   8m51s
anton-sidelnikov commented 4 months ago

Hi @levshvarts thanks for sharing solution, i will close this issue.