kubernetes-sigs / cloud-provider-azure

Cloud provider for Azure
https://cloud-provider-azure.sigs.k8s.io/
Apache License 2.0
258 stars 270 forks source link

Azure Cloud Control Manager deletes existing NSG rules between ASG #6342

Closed ajaysundark closed 1 month ago

ajaysundark commented 1 month ago

What happened:

We created a K8s cluster deployed with Azure Cloud-Controller-Manager (v1.29.3) in the control-plane. When a K8s LoadBalancer service is deployed at the cluster, the CCM created a new NSG rule (k8s-azure-lb_allow_IPv4_xxxx and cleaned up all the existing NSG rules at the cluster security-group (NSG configured as securityGroupName at the cloud-config) for CCM.

Noticed that in the process of adding a new NSG rule for the LB, CCM updated the NSG for existing rules by merging CIDR prefixes but ignored handling rules setup between ASGs. This seems to be a bug from CCM trying to consolidate the existing NSG rules as part of this feature 4713

What you expected to happen:

CCM should not affect existing traffic configured at the NSG for the cluster subnet.

How to reproduce it (as minimally and precisely as possible):

  1. Create a NSG with security-group rules restricting traffic between two ASGs (use rule priority > 500).
  2. Set securityGroupName at cloud-config used by the azure-cloud-config-manager deployment to use the above NSG.
  3. Create a simple load-balancer service for the k8s cluster. This will add a new rule for the k8s service, but delete the existing rules.

Anything else we need to know?:

This behavior is not observed in v1.28.x CCM versions, but persists after 1.29+ versions. Seems to be an inadvertent bug from this PR where reconcileSecurityGroup handles only the ip prefixes but missed the ASG based rules.

Environment:

cc: @zarvd @feiskyer

zarvd commented 1 month ago

@ajaysundark We are working on the fix #6331

feiskyer commented 1 month ago

PR #6331 merged.