kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
4.08k stars 1.52k forks source link

Changing to IP mode breaks #4103

Open olespagnon-vdm opened 2 months ago

olespagnon-vdm commented 2 months ago

Bug Description

When changing from alb.ingress.kubernetes.io/target-type: instance to alb.ingress.kubernetes.io/target-type : ip Load balancer fails to change the targetgroup and I get this error: {"level":"error","ts":"2025-03-20T13:04:05Z","msg":"Reconciler error","controller":"targetGroupBinding","controllerGroup":"elbv2.k8s.aws","controllerKind":"TargetGroupBinding","TargetGroupBinding":{"name":"k8s-mediaspo-servicea-80a47aec1e","namespace":"mynamespace"},"namespace":"mynamespace","name":"k8s-mediaspo-servicea-80a47aec1e","reconcileID":"f931fc53-3c5d-4f1d-a62d-44f589c0d7f2","error":"service type must be either 'NodePort' or 'LoadBalancer': mynamespace/service-api-accessservice"}

Which is weird because some of the api were properly upudated and work as intended. While some don't. In ArgoCD which I use to deploy I get this error on the ingress resource : Failed deploy model due to timed out waiting for the condition

Steps to Reproduce

Expected Behavior

Ingress are valid and TargetGroups are updated properly Actual Behavior

Regression Was the functionality working correctly in a previous version ? [Yes / No] If yes, specify the last version where it worked as expected

Current Workarounds

Environment

olespagnon-vdm commented 2 months ago

Okay so uppon further investigating I found something weird. The new target groups are working as intended but the old target groups remains. Should'nt they be removed once the ingress is updated ? Why is the ALB LC still trying to validate the ingress with the old targets ?

zac-nixon commented 2 months ago

Hello, please fill out this part of the template so we can properly help you.

AWS Load Balancer controller version:
Kubernetes version:
Using EKS (yes/no), if so version?:
Using Service or Ingress:
AWS region:

specifically the AWS Load Balancer controller version and Kubernetes version.

olespagnon-vdm commented 2 months ago

My bad forgot about this. I have edited my original post.

zac-nixon commented 1 month ago

Error is coming from here, https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/pkg/backend/endpoint_resolver.go#L87

I wonder if it's because all you did was change the target type while using the same service. Instance and IP targets use different service types: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.12/guide/ingress/annotations/#target-type

olespagnon-vdm commented 1 month ago

Hi, thanks for the response. You might be right. I'm using a helm template, inflated with ArgoCD and deployed by ArgoCD. But upon further inspection, it seems that indeed the modification of the Service type did not recreate a new service bu changed the existing ones. What do you suggest I me doing ? Should I delete the TargetGroupBinding in the cluster and the aws resources would be deleted ? Should I delete both ? Or should I wait for a 'fix' for this case ?

zac-nixon commented 1 month ago

I think this scenario would fall under not supported behavior, moving to instance -> ip requires a service update and corresponding target group update. If you want to clean up the target group, I would suggest deleting the unused binding and deleting the target group

olespagnon-vdm commented 1 month ago

I've now deleted remaining Instance type TargetGroupBinding and their associated TargetGroups. But now I see an error on the ingresses that ALB Controller monitor for infra. Warning FailedDeployModel 4m14s (x621 over 7d1h) ingress Failed deploy model due to timed out waiting for the condition Is there a way to have more detailed info on why this is failing ? I have a log saying "level": "info", "ts": "2025-03-27T12:43:13Z", "logger": "controllers.ingress", "msg": "successfully built model", "model" :"very long json string of my model"

olespagnon-vdm commented 1 month ago

I think that maybe this is due to my CRDs. But since I'm using helm via Terraform I cannot reinstall them. Is there a way to install CRDs at a specified version instead of master ref ?