kubeslice / worker-operator

Kubeslice Worker Operator Opensource Repository: The KubeSlice Worker Operator is a Kubernetes operator that manages the lifecycle of KubeSlice worker clusters.
Apache License 2.0
58 stars 19 forks source link

Bug: The External LB is getting stuck into "pending" state on AWS & few other clouds when protocol is TCP #330

Open narmidm opened 5 months ago

narmidm commented 5 months ago

📜 Description

while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.

// Note: Special treatment for AWS EKS clusters. The LB is not provisioned unless we add AWS specific annotations // to the service. This is needed only for EKS. if clusterProvider, _ := getClusterProviderID(ctx, r.Client); clusterProvider == "aws" { if svc.ObjectMeta.Annotations == nil { svc.ObjectMeta.Annotations = make(map[string]string) } svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-type"] = "external" svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-nlb-target-type"] = "ip" svc.ObjectMeta.Annotations["service.beta.kubernetes.io/aws-load-balancer-scheme"] = "internet-facing" }

https://github.com/kubeslice/worker-operator/blob/master/controllers/slice/slice_gw_edge.go#L128C1-L137C3 the LB is getting stuck into "pending" state on AWS when protocol is TCP. The issues were resolved simply by removing the annotations.

👟 Reproduction steps

while testing the SGE on private clusters, we found some issues with AWS specific annotations that worker-operator is adding.

👍 Expected behavior

external lb start running without any issue.

👎 Actual Behavior

The External LB is getting stuck into "pending" state.

🐚 Relevant log output

No response

Version

1.2.0

🖥️ What operating system are you seeing the problem on?

Windows

✅ Proposed Solution

We can introduce a new field in the Cluster Custom Resource Definitions (CRDs) for a set of annotations for the External Load Balancer (Ext LB). If any cloud provider needs specific annotations for their Network Load Balancer (Network LB), these can be passed as input during the creation of the Cluster CRD. The Slice Gateway Edge (SGE) will then take these annotations and apply them to the External Load Balancer during its creation.

👀 Have you spent some time to check if this issue has been raised before?

Code of Conduct

Bhargav-InfraCloud commented 3 months ago

Hi team! Is this being worked on? If not, I want to give it a try. /assign

narmidm commented 3 months ago

@Bhargav-InfraCloud, Thanks you for addressing the issue. You will likely need more details, so please join this channel if you are not here yet - https://kubernetes.slack.com/archives/C03Q64HNEJF, and contact Mridul Gain or Bharath for a detailed explanation.

Bhargav-InfraCloud commented 3 months ago

Sure @narmidm. Will check with them. Thanks!

narmidm commented 2 months ago

@Bhargav-InfraCloud any update. did you get clarification from @mridulgain?

Bhargav-InfraCloud commented 2 months ago

@narmidm Yes, he has shared a few details on Slack. Unfortunately, I got caught up with some other work around the same time and it is still going on. If anyone is up for working on this, please feel free to pick. Otherwise, I'll try again sometime.