nginxinc / nginx-asg-sync

NGINX Plus Integration with Cloud Autoscaling
BSD 2-Clause "Simplified" License
56 stars 25 forks source link

AWS EC2 description filter with tag not working for EKS NodeGroup #314

Open bsmerja opened 1 year ago

bsmerja commented 1 year ago

Describe the bug When there are multiple pods running in the EKS environment EKS node members have multiple Private IP addresses based on number of pods running on specific node, nginx-asg-sync fetches any one Private IP address and populates NGINX config, which cause 502 bad gateway.

To Reproduce

  1. Deploy N+ with nginx-asg-sync in front of EKS for Reverse Proxy / LB
  2. config.yaml with autoscaling_group: eks-Node-instances<1234> in line with aws:autoscaling:groupName: eks-Node-instances<1234>
  3. Run multiple pods on EKS

Will cause wrong IP addresses in upstream

Provide the following files as part of the bug report

  1. nginx -T outcome

    configuration file /var/lib/nginx/state/backend-eks.conf:
    server 10.1.20.227:31159;
    server 10.1.20.218:31159;
    server 10.1.20.232:31159;
    server 10.1.20.251:31159;
  2. actual IP addresses of EKS Nodes - kubectl get nodes -o wide

    NAME                                    STATUS  INTERNAL-IP                                               
    ip-10-1-20-218.region.compute.internal   Ready      10.1.20.218
    ip-10-1-20-227.region.compute.internal   Ready     10.1.20.227   
    ip-10-1-20-248.region.compute.internal   Ready    10.1.20.248   
    ip-10-1-20-82.region.compute.internal    Ready     10.1.20.82    
  3. Also, aws cli command gives Private IP address with filter and query as applied:

    aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=eks-Node-instances<1234>" --profile Users-<user-id> --query 'Reservations[*].Instances[*].[PrivateIpAddress]' --output text
    
    10.1.20.227
    10.1.20.218
    10.1.20.82
    10.1.20.248
  4. config.yaml:

    region: <region-name>
    api_endpoint: http://127.0.0.1:8080/api
    sync_interval_in_seconds: 5
    cloud_provider: AWS
    upstreams:
     - name: backend-eks
       autoscaling_group: eks-Node-instances<1234>
       port: 31159
       kind: http
       max_conns: 0
       max_fails: 1
       fail_timeout: 10s
       slow_start: 0s

Steps to reproduce the behaviour, such as:

  1. Scale from 2 to 5 EC2 instances
  2. New instances not added to nginx.conf
  3. See error in /var/log/nginx-asg-sync/nginx-asg-sync.log

Expected behavior A clear and concise description of what you expected to happen.

Your environment nginx-asg-sync version 0.5.0 nginx version: nginx/1.23.2 (nginx-plus-r28) Amazon Linux 2

Additional context Is it possible to add an additional query inside of config.yaml which I used to find exact Private address in following command:

aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=eks-Node-instances<1234>" --profile Users-1234 --query 'Reservations[].Instances[].[PrivateIpAddress]' --output text

nginx-bot[bot] commented 2 weeks ago

Hi @bsmerja! Welcome to the project! 🎉

Thanks for opening this issue! Be sure to check out our Contributing Guidelines and the Issue Lifecycle while you wait for someone on the team to take a look at this.

lucacome commented 2 weeks ago

Hi @bsmerja

I'm trying to reproduce this but creating pods doesn't affect the Autoscaling group for me. Are you still seeing this behavior? If so can you share the logs from nginx-asg-sync as well?