Closed asluborski closed 6 months ago
Hi! Thanks for reporting the issue. Would you be able to share the controller logs? This can help us get to the cause of the issue. Thanks!
Hello, issue was unrelated to ALB. I was using GPU nodes with Amazon Linux 2, tried to install a driver tagged with AL2 that does not exist. I have since moved the OS to bottlerocket NVIDIA and everything is working.
Describe the bug I have an EKS cluster with public/private access on a VPC with public and private subnets. I've setup my ALB in the public subnets on port 80, internet-facing and ip and installed the AWS controller following example through AWS docs and 2048 deployment example. I am using GPU nodes and also set up Kubernetes GPU operator. I have a deployment and service for a flask rest api.
After getting everything setup, I expected the EKS cluster node instances I have running to register into my target group but its empty and the pods have no instances to join.
Here is a screenshot of the ALB and the empty target group from the AWS console
I'm struggling to find an answer as to why this is happening. I've been messing with my ingress and deployment yaml files and thought it was maybe a selector/label issue but that doesn't seem to be the case. My deployment is running a flask api on port 5000 and I am setting a /health path to hit the flask api server /health endpoint and return response.
Deployment.yaml:
ingress.yaml:
service-account.yaml:
This is the dockerfile that I built for the deployment:
I also ran the command
kubectl describe targetgroupbindings -n flask-api-app
and this was the result:namespaces:
Environment Amazon Linux 2 Ubuntu
this is the output of