Autoscaled K8s Nodes are not added to target group automatically with externaltrafficpolicy as local

subhankarc commented 1 year ago

We are trying to understand this behaviour of externalTrafficPolicy attribute in case where the Kubernetes cluster can autoscale in the runtime.

We are using AWS Cluster with "nlb" Load Balancer and have set the "externalTrafficPolicy" attribute of the service to "local".

The experiment is being done to check the "preserving the original source ip address of client" of Istio according to the following documentation, but the problem seems to be the way how Kubernetes service with "externalTrafficPolicy" as Local works along with the NLB load balancers.

azure does not seem to have the problem and also the problem doesn’t exist on earlier version of kubernetes like 1.23.x and 1.24.x

What happened:

When the k8s cluster autoscales and adds a new node to the cluster, we observed that it does not get added to the load balancer target group. We waited for more than an hour but still the target group does not reflect the new node. However, in case of "externalTrafficPolicy" as "Cluster", it gets added to the target group of the load balancer within few mins.

If the pod where the service targets have a new pod autoscaled in the new node, even then it is not added to the target group.

We are of the opinion that this could be a bug and needs to be fixed.

What you expected to happen:

In the first case, even though the node should return healthcheck as failed, the node should get added in the target group.

In the second case, once the autoscaled node has a pod spawned in it, healthcheck also should pass and it should get addedin the target group with successful healthcheck.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): Client Version: v1.21.0 Server Version: v1.25.5
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): gardenlinux

reference https://github.com/istio/istio/issues/43684

/kind bug

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kmala commented 1 year ago

This is because of a bug in upstream https://github.com/kubernetes/kubernetes/issues/117375 and is resolved in https://github.com/kubernetes/kubernetes/pull/117388

subhankarc commented 1 year ago

@kmala https://github.com/kubernetes/kubernetes/issues/117375 seems to be closed ? Is this issue solved on both 1.25 and 1.26 and can it be closed ?

kmala commented 1 year ago

yes, it is solved in both 1.26 and 1.27. The issue is not in 1.25 from what i found.

Venkat-pulagam commented 6 months ago

Hello Everyone, We are also facing similar issue on our kops kubernetes cluster on aws cloud after upgrading k8s version from v1.25.x to v1.27.8. Newly provisioned nodes are not registering to the Ingress controller NLB target group until another new worker node has to joins to the cluster or terminating old worker node to the cluster. We are seeing the below warning logs on AWS CCM(aws-cloud-controller-manager) pods as newly joined node has failed to get provider id to register to the NLB target group. node "i-0d6a5fa94ff4xxx" did not have ProviderID set.

Can someone please suggest us the workaround for issue until new CCM's patch release has to be released.

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 month ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/cloud-provider-aws/issues/575#issuecomment-2154257443): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-aws

Autoscaled K8s Nodes are not added to target group automatically with externaltrafficpolicy as local #575