nginxinc / nginx-gateway-fabric

NGINX Gateway Fabric provides an implementation for the Gateway API using NGINX as the data plane.
Apache License 2.0
487 stars 94 forks source link

Endpoints are not updated correctly by NGF for NGINX Plus #2090

Open salonichf5 opened 3 months ago

salonichf5 commented 3 months ago

Describe the bug When using NGINX Plus with NGINX Gateway Fabric, the upstreams are applied correctly when HTTPRoute is applied via the API, but when the deployment is scaled, upstreams are removed incorrectly.

To Reproduce Steps to reproduce the behavior:

  1. Deploy NGINX Gateway Fabric with NGINX Plus.
  2. Port-Forward the traffic for NGF using:
kubectl port-forward -n nginx-gateway nginx-gateway-nginx-gateway-fabric-67fd757b54-tlvvc 8765:8765 &
  1. Apply this example with updating cafe.yaml to

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: coffee
    spec:
    replicas: 1
    selector:
    matchLabels:
      app: coffee
    template:
    metadata:
      labels:
        app: coffee
    spec:
      containers:
      - name: coffee
        image: nginxdemos/nginx-hello:plain-text
        ports:
        - containerPort: 8080
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: coffee
    spec:
    ports:
    - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
    selector:
    app: coffee
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: tea
    spec:
    replicas: 1
    selector:
    matchLabels:
      app: tea
    template:
    metadata:
      labels:
        app: tea
    spec:
      containers:
      - name: tea
        image: nginxdemos/nginx-hello:plain-text
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /
            port: 1234
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: tea
    spec:
    ports:
    - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
    selector:
    app: tea
  2. Check pods and tea pod will not be ready due to the readiness probe at port 1234

kubectl get pods

coffee-56b44d4c55-j4wpq   1/1     Running   0          130m
tea-7f9b79bc55-lbz95      0/1     Running   0          130m

Check the NGINX Plus Dashboard at the forwarded port. You should see upstreams for both tea and coffee.

Screenshot 2024-06-04 at 3 57 14 PM

  1. Now, scale the tea deployment
kubectl scale deploy tea --replicas=2

Check the dashboard again, you will see only upstream for coffee and not tea.

Screenshot 2024-06-04 at 4 00 33 PM

When sending a curl request to tea, it return 502 which is expected behavior

curl --resolve cafe.example.com:$GW_PORT:$GW_IP http://cafe.example.com:$GW_PORT/tea
http://cafe.example.com:$GW_PORT/tea
Handling connection for 8080
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.25.5</center>
</body>
</html>

Check nginx logs

kubectl logs -n nginx-gateway nginx-gateway-nginx-gateway-fabric-67fd757b54-tlvvc -c nginx

You should see error about no upstream available

2024/06/04 22:03:08 [info] 184#184: *1777 client 127.0.0.1 closed keepalive connection
2024/06/04 22:03:49 [error] 179#179: *1782 no live upstreams while connecting to upstream, client: 127.0.0.1, server: cafe.example.com, request: "GET /tea HTTP/1.1", upstream: "http://default_tea_80/tea", host: "cafe.example.com:8080"
127.0.0.1 - - [04/Jun/2024:22:03:49 +0000] "GET /tea HTTP/1.1" 502 157 "-" "curl/8.4.0"
2024/06/04 22:03:50 [info] 179#179: *1782 client 127.0.0.1 closed keepalive connection
  1. Expected behavior The expected behavior is that upstreams are added correctly when HTTPRoute is added from API as well as when it is scaled. The curl request to tea should return bad gateway, but no errors should be reported in nginx error logs.

Your environment

2024/06/04 22:03:08 [info] 184#184: *1777 client 127.0.0.1 closed keepalive connection
2024/06/04 22:03:49 [error] 179#179: *1782 no live upstreams while connecting to upstream, client: 127.0.0.1, server: cafe.example.com, request: "GET /tea HTTP/1.1", upstream: "http://default_tea_80/tea", host: "cafe.example.com:8080"
127.0.0.1 - - [04/Jun/2024:22:03:49 +0000] "GET /tea HTTP/1.1" 502 157 "-" "curl/8.4.0"
2024/06/04 22:03:50 [info] 179#179: *1782 client 127.0.0.1 closed keepalive connection

Additional context Add any other context about the problem here. Any log files you want to share.

Acceptance

bjee19 commented 3 months ago

Does this only occur when the readiness probe is specified on the tea deployment?