kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.87k stars 1.43k forks source link

error instantiating load balancer: Unable to DescribeInstanceStatus on : InvalidInstanceID.Malformed: Invalid id: "" #534

Closed crsantini closed 6 years ago

crsantini commented 6 years ago

hi there

I'm trying to use ALB Ingress version 1.0-beta.5, but I'm having an issue when deploying my ingress:

[error] E0808 16:54:21.903747 1 albingress.go:166] value-ad-1yfo6a/testalb10-cui-ingress: error instantiating load balancer: Unable to DescribeInstanceStatus on : InvalidInstanceID.Malformed: Invalid id: "" E0808 16:54:21.903768 1 albingress.go:166] value-ad-1yfo6a/testalb10-cui-ingress: status code: 400, request id: bcad797e-038e-4620-b945-9d78a26f1424 E0808 16:54:21.903778 1 albingress.go:167] value-ad-1yfo6a/testalb10-cui-ingress: Will retry in 1.986555483s I0808 16:54:21.903874 1 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"value-ad-1yfo6a", Name:"testalb10-cui-ingress", UID:"acdff1b9-9b2b-11e8-a681-02f406066856", APIVersion:"extensions/v1beta1", ResourceVersion:"512475", FieldPath:""}): type: 'Warning' reason: 'ERROR' error instantiating load balancer: Unable to DescribeInstanceStatus on : InvalidInstanceID.Malformed: Invalid id: "" status code: 400, request id: bcad797e-038e-4620-b945-9d78a26f1424

[ingress annotations] kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/subnets: "subnet-0509300ce1d3804eb,subnet-0aabddf9bd0a52388" alb.ingress.kubernetes.io/healthcheck-path: /docs/logo.png

[tags] My both public subnets subnet-0509300ce1d3804eb,subnet-0aabddf9bd0a52388 are tagged as follow:

kubernetes.io/cluster/c-d2jmz | shared kubernetes.io/role/alb-ingress kubernetes.io/role/elb

Any thoughts?

Thanks in advance!

bigkraig commented 6 years ago

What version of Kubernetes are you running?

crsantini commented 6 years ago

hi @bigkraig thanks for your quick reply. I'm using v1.10.5 with Rancher 2.0.6 RKE cluster.

bigkraig commented 6 years ago

I think we may have made a mistake when switching to the new field. Try out the 534-externalid image once #535 finishes building and let me know if that resolves this for you.

crsantini commented 6 years ago

I tried to update from quay.io/coreos/alb-ingress-controller:1.0-beta.5 to quay.io/coreos/alb-ingress-controller:534-externalid and got error image can't be pulled

crsantini commented 6 years ago

my bad, seems 535 still building

crsantini commented 6 years ago

Deployed, and got a different error now

E0808 17:53:52.717911 1 albingress.go:166] value-ad-ul8wr/testalb11-cui-ingress: error instantiating load balancer: Unable to DescribeInstanceStatus on va-etcd-prod3: InvalidInstanceID.Malformed: Invalid id: "va-etcd-prod3" E0808 17:53:52.717930 1 albingress.go:166] value-ad-ul8wr/testalb11-cui-ingress: status code: 400, request id: cafbbe99-9c7a-4818-b778-2516ba2e9133 E0808 17:53:52.717940 1 albingress.go:167] value-ad-ul8wr/testalb11-cui-ingress: Will retry in 2m7.239751097s I0808 17:53:52.718282 1 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"value-ad-ul8wr", Name:"testalb11-cui-ingress", UID:"d91d5aae-9b2f-11e8-a681-02f406066856", APIVersion:"extensions/v1beta1", ResourceVersion:"551814", FieldPath:""}): type: 'Warning' reason: 'ERROR' error instantiating load balancer: Unable to DescribeInstanceStatus on va-etcd-prod3: InvalidInstanceID.Malformed: Invalid id: "va-etcd-prod3" status code: 400, request id: cafbbe99-9c7a-4818-b778-2516ba2e9133

*note that "va-etc-prod3" its the ID on my node on rancher and also the tag "Name" on EC2 and this is a ETCD type, not worker. Not sure why its trying to describe these instances.

bigkraig commented 6 years ago

It will add any node to the cluster that doesn't have the node-role.kubernetes.io/master annotation or has alpha.service-controller.kubernetes.io/exclude-balancer: true.

It's still strange that the API is providing the node name and not the external id.

Can you give me the output of kubectl describe nodes | grep ProviderID and kubectl describe nodes | grep Name:? You can obscure the values if you like, I just need to know what is returned.

crsantini commented 6 years ago

"It will add any node to the cluster that doesn't have the node-role.kubernetes.io/master annotation or has alpha.service-controller.kubernetes.io/exclude-balancer: true." -> shouldn't just add the nodes where the service/workload were deployed to? in my case, I'm deploying to all workers using replica set.

  1. kubectl describe nodes | grep ProviderID returns nothing

2

kubectl describe nodes | grep Name: Name: ip-10-0-0-207 Name: va-controlplane-prod1 Name: va-controlplane-prod2 Name: va-controlplane-prod3 Name: va-etcd-prod1 Name: va-etcd-prod2 Name: va-etcd-prod3 Name: va-worker-spot-prod1 Name: va-worker-spot-prod2 Name: va-worker-spot-prod3

bigkraig commented 6 years ago

maybe kubectl describe nodes | grep i- then? I don't have a 1.10 cluster and for some reason its having a hard time finding the instance ids. For example ProvideID on my test cluster gives me aws:///us-east-1a/i-02b7f8960fb28e864

It will add all nodes because the AWS API has some limitations and I didn't want to access the API every time a pod is rescheduled. The nodes that don't have the pod either fail the healthcheck or reroute the request depending on the service annotations.

crsantini commented 6 years ago
  1. "It will add all nodes because the AWS API has some limitations and I didn't want to access the API every time a pod is rescheduled. The nodes that don't have the pod either fail the healthcheck or reroute the request depending on the service annotations." -> I see. Thanks for clarifying.

  2. maybe kubectl describe nodes | grep i- then? I don't have a 1.10 cluster and for some reason its having a hard time finding the instance ids. For example ProvideID on my test cluster gives me aws:///us-east-1a/i-02b7f8960fb28e864

-> describe nodes | grep I- doesn't give much relevant info, so I pasted a specific full node below. It seems provider ID is not present, maybe I missed defining AWS as provider when I first created the cluster, would that be the issue?

Name: va-worker-spot-prod3 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=va-worker-spot-prod3 node-role.kubernetes.io/worker=true Annotations: field.cattle.io/publicEndpoints=[{"nodeName":"c-d2jmz:m-5crdb","addresses":["13.229.218.25","52.77.247.25"],"port":30173,"protocol":"TCP","serviceName":"value-ad-ul8wr:testalb11-cui","allNodes":true},... flannel.alpha.coreos.com/backend-data={"VtepMAC":"0e:11:c2:14:cb:83"} flannel.alpha.coreos.com/backend-type=vxlan flannel.alpha.coreos.com/kube-subnet-manager=true flannel.alpha.coreos.com/public-ip=10.0.0.25 node.alpha.kubernetes.io/ttl=0 rke.cattle.io/external-ip=52.77.247.25 rke.cattle.io/internal-ip=10.0.0.25 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: CreationTimestamp: Wed, 08 Aug 2018 05:05:43 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


OutOfDisk False Wed, 08 Aug 2018 18:17:51 +0000 Wed, 08 Aug 2018 05:05:43 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Wed, 08 Aug 2018 18:17:51 +0000 Wed, 08 Aug 2018 05:05:43 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 08 Aug 2018 18:17:51 +0000 Wed, 08 Aug 2018 05:05:43 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 08 Aug 2018 18:17:51 +0000 Wed, 08 Aug 2018 05:05:43 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Wed, 08 Aug 2018 18:17:51 +0000 Wed, 08 Aug 2018 05:06:23 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.0.25 Hostname: va-worker-spot-prod3 Capacity: cpu: 4 ephemeral-storage: 98229932Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15953248Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 90528705182 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15850848Ki pods: 110 System Info: Machine ID: 8c859f2bde9e41155bca40d2d85aebb5 System UUID: EC28893A-21B3-6BA5-0C0E-85196A15A338 Boot ID: c35a93df-c3bd-44fb-a59b-6ed48a3bb93f Kernel Version: 4.14.32-rancher2 OS Image: Debian GNU/Linux 9 (stretch) Operating System: linux Architecture: amd64 Container Runtime Version: docker://17.3.2 Kubelet Version: v1.10.5 Kube-Proxy Version: v1.10.5 PodCIDR: 10.42.10.0/24 ExternalID: va-worker-spot-prod3 Non-terminated Pods: (14 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits


cattle-system cattle-node-agent-twsg6 0 (0%) 0 (0%) 0 (0%) 0 (0%) ingress-nginx nginx-ingress-controller-qzbpr 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system canal-4b7f5 250m (6%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-dns-5ccb66df65-nv95g 260m (6%) 0 (0%) 110Mi (0%) 170Mi (1%) kube-system kube-dns-autoscaler-6c4b786f5-l426v 20m (0%) 0 (0%) 10Mi (0%) 0 (0%) value-ad-84rza8 vaclient04-cui-59b54f6fcf-l4fph 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-84rza8 vaclient04-lms-7c57f4775d-fnjcs 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-f0hj6o vaclient01-cui-d6b9d6ffd-8lgp4 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-f0hj6o vaclient01-lms-68c99875df-f9699 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-nnso4y vaclient02-cui-db57fcd9f-84t8t 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-nnso4y vaclient02-lms-54b7b64ccd-pdfww 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-ul8wr testalb11-cui-58dfcffb6f-bpwj6 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-xre369 vaclient03-cui-764965d745-kqpwj 0 (0%) 0 (0%) 0 (0%) 0 (0%) value-ad-xre369 vaclient03-lms-768f8f55b9-rc7qv 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits


530m (13%)

crsantini commented 6 years ago

Also, would you mind sending me how your ProviderID annotation looks like entirely on your node?

bigkraig commented 6 years ago

I don't know much about RKE but without the instance id in there there isnt much we can do on the controller side.

I'd check with RKE on if a configuration is missing to get the Provider ID in there. I have to assume other software will have problems integrating with AWS without it. Even if we switched the CNI plugin to one that makes the pod IPs routable, we still can't make SG changes to the nodes with the pods to allow the ALB to connect to them without the instance ids.

It's in the spec not an annotation:

apiVersion: v1
kind: Node
spec:
  externalID: i-0de56a54fcedc893b
  podCIDR: 10.2.28.0/24
  providerID: aws:///us-east-1a/i-0de56a54fcedc893b
crsantini commented 6 years ago

Thanks a lot for your help, I will do some debugging and reply tomorrow with my findings! Carlos.

crsantini commented 6 years ago

I just quick setup a new cluster using provider AWS as options, and I could now find "ExternalID: i-08585ee85ac2656f8" but I don't have providerID . Even though, the LB and target groups were successfully created. Thanks!

In addition, I have a quick question which I couldn't find in the docs. Does the service support adding a annotation to activate the "stickness" option?

screen shot 2018-08-09 at 5 07 37 am
bigkraig commented 6 years ago

Excellent. And that is with the branch I made for you, correct?

See the load-balancer-attributes annotation for setting stickiness up.

crsantini commented 6 years ago

yes, on the branch you provided. As for the lb attributes, I tried to set as following on my ingress

alb.ingress.kubernetes.io/attributes: stickiness.enabled=true,stickiness.type=lb_cookie,stickiness.lb_cookie.duration_seconds=604800

ref: https://docs.aws.amazon.com/elasticloadbalancing/latest/APIReference/API_TargetGroupAttribute.html

but got error

E0808 19:26:11.427102 1 loadbalancer.go:454] value-ad-zkmjes/test21-cui-ingress: Failed to add ELBV2 attributes: ValidationError: Load balancer attribute key 'stickiness.enabled' is not recognized E0808 19:26:11.427117 1 loadbalancer.go:454] value-ad-zkmjes/test21-cui-ingress: status code: 400, request id: e8647514-9b40-11e8-9cda-7f63a7050c97 E0808 19:26:11.427123 1 albingress.go:296] value-ad-zkmjes/test21-cui-ingress: Failed to reconcile state on this ingress E0808 19:26:11.427128 1 albingress.go:298] value-ad-zkmjes/test21-cui-ingress: - ValidationError: Load balancer attribute key 'stickiness.enabled' is not recognized E0808 19:26:11.427131 1 albingress.go:298] value-ad-zkmjes/test21-cui-ingress: status code: 400, request id: e8647514-9b40-11e8-9cda-7f63a7050c97

bigkraig commented 6 years ago

Looks like I gave you FUD, the sticky annotations should be in target-group-attributes. sorry about that!

crsantini commented 6 years ago

No worries, it worked perfectly now. Thanks again! If you allow me one more question. After I delete my ingress/workload, the LB's are not being deleted from AWS, is this expected or its missing an annotation somewhere?

bigkraig commented 6 years ago

If the controller is still running it will see that the ingress resource ahs been removed and delete the resources. If you're deleting the entire cluster at the same time, it won't have a chance to do that

crsantini commented 6 years ago

I''m just deleting the ingress entries and workloads, not the cluster, and then alb shows the following in the logs

E0808 19:41:25.931565 1 albingress.go:166] value-ad-vwowal/valueadvwowal-cui-ingress: error instantiating load balancer: Unable to find the value-ad-vwowal/valueadvwowal-cui service: no object matching key "value-ad-vwowal/valueadvwowal-cui" in local store E0808 19:41:25.931615 1 albingress.go:167] value-ad-vwowal/valueadvwowal-cui-ingress: Will retry in 1m0.178083172s I0808 19:41:25.931933 1 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"value-ad-vwowal", Name:"valueadvwowal-cui-ingress", UID:"6c16f206-9b42-11e8-bb65-02f78957f2aa", APIVersion:"extensions/v1beta1", ResourceVersion:"5807", FieldPath:""}): type: 'Warning' reason: 'ERROR' error instantiating load balancer: Unable to find the value-ad-vwowal/valueadvwowal-cui service: no object matching key "value-ad-vwowal/valueadvwowal-cui" in local store

bigkraig commented 6 years ago

Ah thats a bug then. Can you open up another issue with that? I don't think I have ever tried to delete the service before the ingress.

crsantini commented 6 years ago

sure, will do.