DNS cannot be resolved for HTTPRoute

StoveCode commented 8 months ago

First of all, this is a great tool :) thank you

I currently have a problem with the "Kubernetes Gateway API" DNS requests are not resolved. installed with Helm -> latest release

Logs

[INFO] 172.16.210.192:30184 - 6257 "A IN httpbin1.test.de udp 66 true 4000" NOERROR qr,aa,ra 180 0.000317369s
[DEBUG] plugin/k8s_gateway: Computed Index Keys [httpbin1.test.de httpbin1]
[DEBUG] plugin/k8s_gateway: Found 0 matching VirtualServer objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Ingress objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Service objects
[DEBUG] plugin/k8s_gateway: Computed response addresses []

As you can see here, the recource httproute is not looked at, so nothing can be resolved

Is this problem expected to be fixed?

networkop commented 8 months ago

This is most likely due to gateway not having an IP assigned which could be either metalLB (or whatever LB provider) not assigning an IP. Or it could be due to Istio not doing its work, e.g. latest releases of Istio stopped supporting old GwAPI CRDs. There are a couple of open PRs update the supported version of CRDs to 1.0

StoveCode commented 8 months ago

Thank you for your explanation I think this is more due to the istio problem I have recently updated to 1.20

larivierec commented 7 months ago

Istio would have to support gateway apis 1.0+. If this is the case, it should work defacto

larivierec commented 7 months ago

Have you tried the latest istio gw?

https://tetrate.io/blog/whats-new-in-istio-120-gateway-api-external-endpoint-enhancements-wasm-updates-and-more/#

V1.20

Apocrathia commented 6 months ago

Same thing using the Cilium backend for the gateway API. It's possibly an issue with Cilium not returning the address, but it definitely has one.

[DEBUG] plugin/k8s_gateway: Computed Index Keys [longhorn-ui.k8s.apocrathia.com longhorn-ui]
[DEBUG] plugin/k8s_gateway: Found 1 matching httpRoute objects
[DEBUG] plugin/k8s_gateway: Found 0 matching gateway objects
[DEBUG] plugin/k8s_gateway: Found 0 matching tlsRoute objects
[DEBUG] plugin/k8s_gateway: Found 0 matching grpcRoute objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Ingress objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Service objects
[DEBUG] plugin/k8s_gateway: Computed response addresses []
[INFO] 10.100.0.179:49772 - 39844 "A IN longhorn-ui.k8s.apocrathia.com. udp 48 false 512" NXDOMAIN qr,aa,rd 197 0.000221408s

larivierec commented 6 months ago

Is your manifest public? Can you link it, I'm curious

Also I did notice that the above example k8s_gateway doesn't seem to be listening on HTTPRoute.

See my config for an example.

https://github.com/larivierec/home-cluster/blob/main/kubernetes/apps/networking/k8s-gateway/app/config/Corefile#L23

Apocrathia commented 6 months ago

@larivierec It's not, but I'm just installing via the helm chart via flux. Here's what I've got:

---
# yaml-language-server: $schema=https://kubernetes-schemas.devbu.io/helm.toolkit.fluxcd.io/helmrelease_v2beta1.json
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: k8s-gateway
  namespace: kube-system
spec:
  interval: 30m
  chart:
    spec:
      # renovate: datasource=helm registryUrl=https://ori-edge.github.io/k8s_gateway/
      chart: k8s-gateway
      version: 2.1.0
      sourceRef:
        kind: HelmRepository
        name: k8s-gateway
        namespace: flux-system
  maxHistory: 2
  install:
    remediation:
      retries: 3
  upgrade:
    cleanupOnFail: true
    remediation:
      retries: 3
  uninstall:
    keepHistory: false
  values:
    # Delegated domain
    domain: "k8s.apocrathia.com"

    # TTL for non-apex responses (in seconds)
    ttl: 300

    # Resources (CPU, memory etc)
    resources: {}

    # Limit what kind of resources to watch, e.g. watchedResources: ["Ingress"]
    watchedResources: []

    # Service name of a secondary DNS server (should be `serviceName.namespace`)
    secondary: ""

    # Enabled fallthrough for k8s_gateway
    fallthrough:
      enabled: false
      zones: []

    # Override the default `serviceName.namespace` domain apex
    apex: "k8s.apocrathia.com"

    # Optional configuration option for DNS01 challenge that will redirect all acme
    # challenge requests to external cloud domain (e.g. managed by cert-manager)
    # See: https://cert-manager.io/docs/configuration/acme/dns01/
    dnsChallenge:
      enabled: false
      domain: dns01.clouddns.com

    # Optional plugins that will be enabled in the zone, e.g. "forward . /etc/resolve.conf"
    extraZonePlugins:
      - name: log
      - name: errors
      # Serves a /health endpoint on :8080, required for livenessProbe
      - name: health
        configBlock: |-
          lameduck 5s
      # Serves a /ready endpoint on :8181, required for readinessProbe
      - name: ready
      # Serves a /metrics endpoint on :9153, required for serviceMonitor
      - name: prometheus
        parameters: 0.0.0.0:9153
      - name: forward
        parameters: . /etc/resolv.conf
      - name: loop
      - name: reload
      - name: loadbalance

    serviceAccount:
      create: true
      name: "k8s-gateway"
      annotations: {}

    service:
      type: LoadBalancer
      port: 53
      annotations: {}
      # nodePort: 30053
      loadBalancerIP: 10.50.8.53 # ARP
      # clusterIP: 10.43.0.53
      # externalTrafficPolicy: Local
      # externalIPs:
      #   - 10.0.8.53 # BGP
      #   - 10.50.8.53 # ARP
      # One of SingleStack, PreferDualStack, or RequireDualStack.
      # ipFamilyPolicy: SingleStack
      # List of IP families (e.g. IPv4 and/or IPv6).
      # ref: https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services
      # ipFamilies:
      #   - IPv4
      #   - IPv6

    nodeSelector: {}

    affinity: {}

    replicaCount: 1

    # Optional PriorityClass that will be used in the Deployment, e.g. priorityClassName: "system-cluster-critical"
    priorityClassName: ""

    debug:
      enabled: true

    secure: true

    zoneFiles: []
    #    - filename: example.db
    #      # Optional
    #      domains: example.com
    #      contents: |
    #        example.com.   IN SOA sns.dns.icann.com. noc.dns.icann.com. 2015082541 7200 3600 1209600 3600
    #        example.com.   IN NS  b.iana-servers.net.
    #        example.com.   IN NS  a.iana-servers.net.
    #        example.com.   IN A   192.168.99.102
    #        *.example.com. IN A   192.168.99.102

It's a pretty basic deployment. ¯\_(ツ)_/¯

larivierec commented 6 months ago

Your watchedResources needs to be set properly in order for the gateway to respond to requests.

Try setting it to

watchedResources: ["Service","HTTPRoute"]

By default it's empty, so it won't respond to anything.

Make sure it's like this in the configMap as well 😃

Apocrathia commented 6 months ago

Changed it and that didn't do anything, Still not returning the correct address with

watchedResources: ["Service", "HTTPRoute", "Ingress", "TCPRoute", "UDPRoute"]

Still getting matches with no response.

[DEBUG] plugin/k8s_gateway: Computed Index Keys [longhorn-ui.k8s.apocrathia.com longhorn-ui]
[DEBUG] plugin/k8s_gateway: Found 0 matching Service objects
[DEBUG] plugin/k8s_gateway: Found 1 matching httpRoute objects
[DEBUG] plugin/k8s_gateway: Found 0 matching gateway objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Ingress objects
[DEBUG] plugin/k8s_gateway: Computed response addresses []
[INFO] 10.100.0.179:51902 - 48007 "A IN longhorn-ui.k8s.apocrathia.com. udp 48 false 512" NXDOMAIN qr,aa,rd 187 0.000136881s
[DEBUG] plugin/k8s_gateway: Computed Index Keys [example-app.k8s.apocrathia.com example-app]
[DEBUG] plugin/k8s_gateway: Found 0 matching Service objects
[DEBUG] plugin/k8s_gateway: Found 1 matching httpRoute objects
[DEBUG] plugin/k8s_gateway: Found 0 matching gateway objects
[DEBUG] plugin/k8s_gateway: Found 0 matching Ingress objects
[DEBUG] plugin/k8s_gateway: Computed response addresses []
[INFO] 10.100.0.179:58976 - 59943 "A IN example-app.k8s.apocrathia.com. udp 48 false 512" NXDOMAIN qr,aa,rd 187 0.000107225s

Edit: And here's the ConfigMap that gets generated from the Helm chart.

.:1053 {
    debug
    k8s_gateway k8s.apocrathia.com {
      apex k8s.apocrathia.com
      ttl 300
      resources Service HTTPRoute Ingress TCPRoute UDPRoute
    }
    log
    errors
    health { 
      lameduck 5s
    }
    ready
    prometheus 0.0.0.0:9153
    forward . /etc/resolv.conf
    loop
    reload
    loadbalance
}

Apocrathia commented 6 months ago

Upon working with it some more, I was finally able to get k8s_gateway to resolve resources. However, the gateway resources must be in the same namespace as k8s_gateway. RBAC changes would be needed to allow k8s_gateway to access Service objects outside of it's deployed namespace. That gets the resolution working. Additionally, it appears that I was missing a ReferenceGrant object to connect routes to services in different namespaces.

Going back to the original issue, it would appear that resolution is working correctly, and we're all just using a brand new API that we haven't fully gotten the hang of.

@StoveCode Were you able to get your helm deployment updated to include the gateway resources, and were you able to resolve them?

StoveCode commented 6 months ago

Upon working with it some more, I was finally able to get k8s_gateway to resolve resources. However, the gateway resources must be in the same namespace as k8s_gateway. RBAC changes would be needed to allow k8s_gateway to access Service objects outside of it's deployed namespace. That gets the resolution working. Additionally, it appears that I was missing a ReferenceGrant object to connect routes to services in different namespaces.

Going back to the original issue, it would appear that resolution is working correctly, and we're all just using a brand new API that we haven't fully gotten the hang of.

@StoveCode Were you able to get your helm deployment updated to include the gateway resources, and were you able to resolve them?

Not yet. My work there was on hold during the holidays. Further progress is sheduled for January.

larivierec commented 6 months ago

Upon working with it some more, I was finally able to get k8s_gateway to resolve resources. However, the gateway resources must be in the same namespace as k8s_gateway. RBAC changes would be needed to allow k8s_gateway to access Service objects outside of it's deployed namespace. That gets the resolution working. Additionally, it appears that I was missing a ReferenceGrant object to connect routes to services in different namespaces.

Going back to the original issue, it would appear that resolution is working correctly, and we're all just using a brand new API that we haven't fully gotten the hang of.

@StoveCode Were you able to get your helm deployment updated to include the gateway resources, and were you able to resolve them?

~~Oh, if you look in the test folder I think the permissions for gateway apis are there, I'll most probably create a PR to fix the ClusterRole when the array contains keywords~~

EDIT: ignore my comment, i was unable to see it properly on mobile but the important section is already there and should have access to everything in all namespaces.

  - apiGroups:
      - gateway.networking.k8s.io
    resources:
      - "*"
    verbs:
      - watch
      - list

Apocrathia commented 6 months ago

@larivierec Correct. I believe the issue that I was having was due to the Service not being in the same namespace as the k8s_gateway pod. I could resolve the hostnames set in the HTTPRoute object, but it would return an NXDOMAIN error unless the Service was in the same namespace as k8s_gateway. That permission is already set, though. I'm going to blame Cilium, but I have no evidence to support it, lol.

StoveCode commented 6 months ago

@larivierec @Apocrathia The helm upgrade did not work for me. i received the following error message.

Error: UPGRADE FAILED: cannot patch "exdns-k8s-gateway" with kind Service: Service "exdns-k8s-gateway" is invalid: spec.ports: Required value

I solved the problem by uninstalling the whole deployment and reinstalling the new version with my old values.

Sorry for the bad communication the last few weeks, I am currently struggling with an illness.

Next week I will integrate the K8s gw API and test the HTTPRoutes

Apocrathia commented 6 months ago

@StoveCode The error suggests that you were missing a value. It's pointing here in the manifests which are defined here in the values.

No worries about the communication. Holiday shenanigans have taken everyone out of commission, and getting sick tends to come along with it. Take care of yourself. We can take care of this issue once you're back to feeling better.

StoveCode commented 6 months ago

Issue #259 To solve update to Helm release 2.3.0

ori-edge / k8s_gateway

DNS cannot be resolved for HTTPRoute #242