kubernetes / cloud-provider-aws

Cloud provider for AWS
https://cloud-provider-aws.sigs.k8s.io/
Apache License 2.0
374 stars 299 forks source link

Service controller doesn't populate TargetGroups #915

Open jacekn opened 1 month ago

jacekn commented 1 month ago

What happened:

I deployed the controller and configured NLB type Service. The service was created in AWS with associated target group but the target group is empty

What you expected to happen:

I expected the service-lb-controller controller to populate the target group

How to reproduce it (as minimally and precisely as possible):

I deployed the controller using manifest generated like this:

cat << EOF | helm template --values=- aws-cloud-controller-manager aws-cloud-controller-manager/aws-cloud-controller-manager>aws-cloud-controller.yaml
args:
  - --v=2
  - --cloud-provider=aws
  - --cluster-name=mycluster
  - --controllers=service-lb-controller,cloud-node
  - --allocate-node-cidrs=false
  - --configure-cloud-routes=false

image:
    repository: registry.k8s.io/provider-aws/cloud-controller-manager
    tag: v1.30.0
EOF

And used IAM policy from the docs. I then created the Service object like this:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  name: test-service
  namespace: mynamespace
spec:
  allocateLoadBalancerNodePorts: true
  externalTrafficPolicy: Local
  healthCheckNodePort: 30309
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: http
    name: http
    nodePort: 31199
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: myapp
  sessionAffinity: None
  type: LoadBalancer

Once applied ELB was created together with health checks and TargetGroups. However target groups are empty. I also noticed that security group entries were not added.

Anything else we need to know?:

This used to work with in-tree controller. We disabled the in-tree and moved to external and the Service controller no longer works in the same cluster.

Logs show successful calls to retrieve node details from AWS, for example:

I0516 09:28:28.565710       1 log_handler.go:37] AWS API ValidateResponse: ec2 DescribeInstances &{DescribeInstances POST / 0xc0004fee60 <nil>} {
  InstanceIds: ["i-xyz"]
} 200 OK

I also confirmed with CloudTrail that there are no permission error with API calls.

If I add nodes manually to the Target group they are removed from the Target group.

Environment:

/kind bug

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.