k8sgateway / k8sgateway

The Cloud-Native API Gateway and AI Gateway
https://k8sgateway.io/
Apache License 2.0
4.11k stars 448 forks source link

request still returns 200 to static upstream when dns record is deleted. #6331

Closed pszeto closed 8 months ago

pszeto commented 2 years ago

Gloo Edge Version

1.11.x (latest stable)

Kubernetes Version

1.21.x

Describe the bug

Requests through the gateway-proxy returns 200 successful after the dns record is deleted.

Steps to reproduce the bug

  1. Create a dns record gloo-test.duckdns.org and point to 34.231.5.222 which is httpbin.org
  2. Create a static upstream to with host address: gloo-test.duckdns.org
    apiVersion: gloo.solo.io/v1
    kind: Upstream
    metadata:
    name: static-upstream
    namespace: gloo-system
    spec:
    static:
    hosts:
      - addr: gloo-test.duckdns.org
        port: 80
  3. Create a VirtualService:
    apiVersion: gateway.solo.io/v1
    kind: VirtualService
    metadata:
    name: static
    namespace: gloo-system
    spec:
    virtualHost:
    domains:
      - '*'
    routes:
      - matchers:
         - prefix: /
        routeAction:
          single:
            upstream:
              name: static-upstream
              namespace: gloo-system
        options:
          autoHostRewrite: true
          headerManipulation:
            requestHeadersToAdd:
            - header:
                key: went-thru-gloo
                value: "true"
  4. Do a nslookup to the gloo-test.duckdns.org and verify it's correct:
    
    nslookup gloo-test.duckdns.org
    Server:     8.8.8.8
    Address:    8.8.8.8#53

Non-authoritative answer: Name: gloo-test.duckdns.org Address: 34.231.5.222

5.  curl the endpoint through the gateway-proxy: `curl $(glooctl proxy url)/get -v`

Expected Behavior

After deleting the dns record and getting the ** server can't find gloo-test.duckdns.org: NXDOMAIN from nslookup gloo-test.duckdns.org. I expect the call to the static upstream that is referencing that host to return a 503 - no health host. Access logs from the proxy shows:

Once I reboot the gateway proxy, I get the expect 503

===================================================================================
Fri Apr 15 16:13:14 EDT 2022
nslookup gloo-test.duckdns.org
Server:     8.8.8.8
Address:    8.8.8.8#53

** server can't find gloo-test.duckdns.org: NXDOMAIN

*   Trying 35.194.94.205:80...
* Connected to 35.194.94.205 (35.194.94.205) port 80 (#0)
> GET /get HTTP/1.1
> Host: 35.194.94.205
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 19
< content-type: text/plain
< date: Fri, 15 Apr 2022 20:13:13 GMT
< server: envoy
<
* Connection #0 to host 35.194.94.205 left intact
no healthy upstream

Additional Context

No response

soloio-bot commented 1 year ago

Zendesk ticket #2894 has been linked to this issue.

nfuden commented 9 months ago

I believe what is occurring is the following: Static upstreams do not recheck for their ip configuration https://github.com/envoyproxy/envoy/blob/9fc968d757339d7c476ac890a3eea873caac5ee9/source/extensions/clusters/static/static_cluster.cc#L10 unlike say strict https://github.com/envoyproxy/envoy/blob/9fc968d757339d7c476ac890a3eea873caac5ee9/source/extensions/clusters/strict_dns/strict_dns_cluster.cc#L103

Solution: use logical or strict dns resolution

DuncanDoyle commented 8 months ago

@nfuden: Since this "works as expected", can we close this one as "Not a bug - won't fix"?

nfuden commented 8 months ago

I would argue for that. If the ux pain persists perhaps we need to add an insight around this or update our docs to call out the behavior more clearly

DuncanDoyle commented 8 months ago

@nfuden: I did find the documentation in Envoy on discovered upstreams: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/service_discovery

I know we have documentation on static upstreams in our GE docs: https://docs.solo.io/gloo-edge/latest/guides/traffic_management/destination_types/static_upstream/

... but I couldn't find anything on "Strict DNS" and/or "Logical" in our Gloo Edge docs and APIs. My guess is that the (some of) the other UpstreamSpecs (https://docs.solo.io/gloo-edge/1.7.23/reference/api/github.com/solo-io/gloo/projects/gloo/api/v1/upstream.proto.sk/) translate into this, but I lack a bit of insight here to guide the docs team (because I do think we should add something about this in our docs).

nfuden commented 8 months ago

We set strict_dns if hostname is not empty in static upstreams

DuncanDoyle commented 8 months ago

Can't reproduce on 1.15.14 when using strict_dns (which is created when you use a hostname in the static Upstream).

Reproducer project here: https://github.com/DuncanDoyle/ge-gloo-6331

DuncanDoyle commented 8 months ago

Closing. Can't reproduce.