ori-edge / k8s_gateway

A CoreDNS plugin to resolve all types of external Kubernetes resources
Apache License 2.0
307 stars 65 forks source link

After updating from Helm Release 2.1.0 to 2.3.0, recource can no longer be resolved #259

Closed StoveCode closed 8 months ago

StoveCode commented 8 months ago

Hello :) I'll open a new issue for this After the update, I could no longer resolve a single recource. Below you can see all logs and steps to reproduce after a rollback everything worked normally again

Logs:

Before: [INFO] plugin/k8s_gateway: Synced all required resources [DEBUG] plugin/k8s_gateway: Computed Index Keys [eck.kubernetes.intern.test.de eck] [DEBUG] plugin/k8s_gateway: Found 0 matching VirtualServer objects [DEBUG] plugin/k8s_gateway: Found 1 matching Ingress objects [DEBUG] plugin/k8s_gateway: Computed response addresses [10.20.0.10] [INFO] 172.16.210.192:46525 - 52190 "A IN eck.kubernetes.intern.test.de. udp 50 false 512" NOERROR qr,aa 98 0.000216367s

After: │ W0115 10:47:54.369705 1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1alpha2.GRPCRoute: the server could not find the requested resource (ge │ │ E0115 10:47:54.369748 1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1alpha2.GRPCRoute: failed to list *v1alpha2.GRPCRoute: the server coul │ │ W0115 10:47:57.449529 1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1alpha2.GRPCRoute: the server could not find the requested resource (ge │ │ E0115 10:47:57.449571 1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1alpha2.GRPCRoute: failed to list *v1alpha2.GRPCRoute: the server coul │ │ W0115 10:48:03.633345 1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1alpha2.GRPCRoute: the server could not find the requested resource (ge │ │ E0115 10:48:03.633430 1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1alpha2.GRPCRoute: failed to list *v1alpha2.GRPCRoute: the server coul │ │ [DEBUG] plugin/k8s_gateway: Computed Index Keys [eck.kubernetes.intern.test.de eck] │ │ [INFO] 172.16.210.192:48950 - 606 "A IN eck.kubernetes.intern.test.de. udp 61 true 4000" - - 0 0.000151045s │ │ [ERROR] plugin/errors: 2 eck.kubernetes.intern.test.de. A: plugin/k8s_gateway: Could not sync required resources │ │ [DEBUG] plugin/k8s_gateway: Computed Index Keys [eck.kubernetes.intern.test.de eck] │ │ [INFO] 172.16.210.192:57870 - 11305 "A IN eck.kubernetes.intern.test.de. udp 50 false 512" - - 0 0.000110593s │ │ [ERROR] plugin/errors: 2 eck.kubernetes.intern.test.de. A: plugin/k8s_gateway: Could not sync required resources

Steps to reproduce:

Values:

image:
  registry: quay.io
  repository: oriedge/k8s_gateway
  tag: v0.4.0
  pullPolicy: IfNotPresent

# Delegated domain
domain: kubernetes.intern.test.de

# TTL for non-apex responses (in seconds)
ttl: 300

# Resources (CPU, memory etc)
resources: {}

# Limit what kind of resources to watch, e.g. watchedResources: ["Ingress"]
watchedResources: []

# Service name of a secondary DNS server (should be `serviceName.namespace`)
secondary: ""

# Enabled fallthrough for k8s_gateway
fallthrough:
  enabled: true
  zones: 
    - kubernetes.intern.test.de

# Override the default `serviceName.namespace` domain apex
apex: kubernetes.intern.test.de

# Optional configuration option for DNS01 challenge that will redirect all acme
# challenge requests to external cloud domain (e.g. managed by cert-manager)
# See: https://cert-manager.io/docs/configuration/acme/dns01/
dnsChallenge:
  enabled: true
  domain: kubernetes.intern.test.de

# Optional plugins that will be enabled in the zone, e.g. "forward . /etc/resolve.conf"
extraZonePlugins:
  - name: log
  - name: errors
  # Serves a /health endpoint on :8080, required for livenessProbe
  - name: health
    configBlock: |-
      lameduck 5s
  # Serves a /ready endpoint on :8181, required for readinessProbe
  - name: ready
  # Serves a /metrics endpoint on :9153, required for serviceMonitor
  - name: prometheus
    parameters: 0.0.0.0:9153
  - name: forward
    parameters: . /etc/resolv.conf
  - name: loop
  - name: reload
  - name: loadbalance

serviceAccount:
  create: true
  name: ""
  annotations: {}

service:
  type: LoadBalancer
  port: 53
  annotations:
    metallb.universe.tf/loadBalancerIPs: 10.20.0.50
  labels: {}
  # nodePort: 30053
  # loadBalancerIP: 192.168.1.2
  # clusterIP: 10.43.0.53
  # externalTrafficPolicy: Local

  # One of SingleStack, PreferDualStack, or RequireDualStack.
  # ipFamilyPolicy: SingleStack
  # List of IP families (e.g. IPv4 and/or IPv6).
  # ref: https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services
  # ipFamilies:
  #   - IPv4
  #   - IPv6

nodeSelector: {}

affinity: {}

replicaCount: 1

# Optional PriorityClass that will be used in the Deployment, e.g. priorityClassName: "system-cluster-critical"
priorityClassName: ""

debug:
  enabled: true

secure: true

zoneFiles: []
#    - filename: example.db
#      # Optional
#      domains: example.com
#      contents: |
#        example.com.   IN SOA sns.dns.icann.com. noc.dns.icann.com. 2015082541 7200 3600 1209600 3600
#        example.com.   IN NS  b.iana-servers.net.
#        example.com.   IN NS  a.iana-servers.net.
#        example.com.   IN A   192.168.99.102
#        *.example.com. IN A   192.168.99.102

Install Command :

helm install exdns k8s_gateway/k8s-gateway --values values.yaml
networkop commented 8 months ago

this is strange. @dnrce do you have and ideas?

networkop commented 8 months ago

@StoveCode can you check if the RBAC rules are the same between the two versions?

StoveCode commented 8 months ago

@networkop I compared it with VS Code, the rabc rules have not changed

StoveCode commented 8 months ago

Old:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ include "k8s-gateway.fullname" . }}
  labels:
    {{- include "k8s-gateway.labels" . | nindent 4 }}
rules:
- apiGroups:
  - ""
  resources:
  - services
  - namespaces
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - list
  - watch
- apiGroups: ["gateway.networking.k8s.io"]
  resources: ["*"]
  verbs: ["watch", "list"]
- apiGroups: ["k8s.nginx.org"]
  resources: ["*"]
  verbs: ["watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ include "k8s-gateway.fullname" . }}
  labels:
  {{- include "k8s-gateway.labels" . | nindent 4 }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: {{ include "k8s-gateway.fullname" . }}
subjects:
- kind: ServiceAccount
  name: {{ include "k8s-gateway.serviceAccountName" . }}
  namespace: {{ .Release.Namespace }}

New:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ include "k8s-gateway.fullname" . }}
  labels:
    {{- include "k8s-gateway.labels" . | nindent 4 }}
rules:
- apiGroups:
  - ""
  resources:
  - services
  - namespaces
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - list
  - watch
- apiGroups: ["gateway.networking.k8s.io"]
  resources: ["*"]
  verbs: ["watch", "list"]
- apiGroups: ["k8s.nginx.org"]
  resources: ["*"]
  verbs: ["watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ include "k8s-gateway.fullname" . }}
  labels:
  {{- include "k8s-gateway.labels" . | nindent 4 }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: {{ include "k8s-gateway.fullname" . }}
subjects:
- kind: ServiceAccount
  name: {{ include "k8s-gateway.serviceAccountName" . }}
  namespace: {{ .Release.Namespace }}
networkop commented 8 months ago

thanks @StoveCode . can you spot any difference in other manifests? the binary hasn't changed, so it must be something to do with helm

dnrce commented 8 months ago

I see no other differences between the chart versions except the expected presence of service labels, if supplied:

Empty service.labels:

$ diff -U4 <(helm template exdns --set domain=foo k8s_gateway/k8s-gateway --version 2.1.0) <(helm template exdns --set domain=foo k8s_gateway/k8s-gateway --version 2.3.0)
--- /dev/fd/63  2024-01-15 09:59:44.000000000 -0500
+++ /dev/fd/62  2024-01-15 09:59:44.000000000 -0500
@@ -4,9 +4,9 @@
 kind: ServiceAccount
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -16,9 +16,9 @@
 kind: ConfigMap
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -47,9 +47,9 @@
 kind: ClusterRole
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -82,9 +82,9 @@
 kind: ClusterRoleBinding
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -102,9 +102,9 @@
 kind: Service
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -127,9 +127,9 @@
 kind: Deployment
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -144,9 +144,9 @@
       labels:
         app.kubernetes.io/name: k8s-gateway
         app.kubernetes.io/instance: exdns
       annotations:
-        checksum/config: d84b6349d5916b297258731f9b0984728b8073e05226afb9a072534578e7f553
+        checksum/config: 5be6131e60a7f6417ee7a531912d0c0c60642dd0532b1cc3d45ff51586673a9c
     spec:
       serviceAccountName: exdns-k8s-gateway
       containers:
       - name: k8s-gateway

Supplied service.labels:

$ diff -U4 <(helm template exdns --set domain=foo --set service.labels.foo=bar k8s_gateway/k8s-gateway --version 2.1.0) <(helm template exdns --set domain=foo --set service.labels.foo=bar k8s_gateway/k8s-gateway --version 2.3.0)
--- /dev/fd/63  2024-01-15 10:00:14.000000000 -0500
+++ /dev/fd/62  2024-01-15 10:00:14.000000000 -0500
@@ -4,9 +4,9 @@
 kind: ServiceAccount
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -16,9 +16,9 @@
 kind: ConfigMap
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -47,9 +47,9 @@
 kind: ClusterRole
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -82,9 +82,9 @@
 kind: ClusterRoleBinding
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -102,9 +102,10 @@
 kind: Service
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    foo: bar
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -127,9 +128,9 @@
 kind: Deployment
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
dnrce commented 8 months ago

Same with the values provided in the reproduction steps:

$ cat values.yaml
image:
  registry: quay.io
  repository: oriedge/k8s_gateway
  tag: v0.4.0
  pullPolicy: IfNotPresent

# Delegated domain
domain: kubernetes.intern.test.de

# TTL for non-apex responses (in seconds)
ttl: 300

# Resources (CPU, memory etc)
resources: {}

# Limit what kind of resources to watch, e.g. watchedResources: ["Ingress"]
watchedResources: []

# Service name of a secondary DNS server (should be `serviceName.namespace`)
secondary: ""

# Enabled fallthrough for k8s_gateway
fallthrough:
  enabled: true
  zones:
    - kubernetes.intern.test.de

# Override the default `serviceName.namespace` domain apex
apex: kubernetes.intern.test.de

# Optional configuration option for DNS01 challenge that will redirect all acme
# challenge requests to external cloud domain (e.g. managed by cert-manager)
# See: https://cert-manager.io/docs/configuration/acme/dns01/
dnsChallenge:
  enabled: true
  domain: kubernetes.intern.test.de

# Optional plugins that will be enabled in the zone, e.g. "forward . /etc/resolve.conf"
extraZonePlugins:
  - name: log
  - name: errors
  # Serves a /health endpoint on :8080, required for livenessProbe
  - name: health
    configBlock: |-
      lameduck 5s
  # Serves a /ready endpoint on :8181, required for readinessProbe
  - name: ready
  # Serves a /metrics endpoint on :9153, required for serviceMonitor
  - name: prometheus
    parameters: 0.0.0.0:9153
  - name: forward
    parameters: . /etc/resolv.conf
  - name: loop
  - name: reload
  - name: loadbalance

serviceAccount:
  create: true
  name: ""
  annotations: {}

service:
  type: LoadBalancer
  port: 53
  annotations:
    metallb.universe.tf/loadBalancerIPs: 10.20.0.50
  labels: {}
  # nodePort: 30053
  # loadBalancerIP: 192.168.1.2
  # clusterIP: 10.43.0.53
  # externalTrafficPolicy: Local

  # One of SingleStack, PreferDualStack, or RequireDualStack.
  # ipFamilyPolicy: SingleStack
  # List of IP families (e.g. IPv4 and/or IPv6).
  # ref: https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services
  # ipFamilies:
  #   - IPv4
  #   - IPv6

nodeSelector: {}

affinity: {}

replicaCount: 1

# Optional PriorityClass that will be used in the Deployment, e.g. priorityClassName: "system-cluster-critical"
priorityClassName: ""

debug:
  enabled: true

secure: true

zoneFiles: []
#    - filename: example.db
#      # Optional
#      domains: example.com
#      contents: |
#        example.com.   IN SOA sns.dns.icann.com. noc.dns.icann.com. 2015082541 7200 3600 1209600 3600
#        example.com.   IN NS  b.iana-servers.net.
#        example.com.   IN NS  a.iana-servers.net.
#        example.com.   IN A   192.168.99.102
#        *.example.com. IN A   192.168.99.102
$ diff -U4 <(helm template exdns k8s_gateway/k8s-gateway --version 2.1.0 --values values.yaml) <(helm template exdns k8s_gateway/k8s-gateway --version 2.3.0 --values values.yaml)
--- /dev/fd/63  2024-01-15 10:09:10.000000000 -0500
+++ /dev/fd/62  2024-01-15 10:09:10.000000000 -0500
@@ -4,9 +4,9 @@
 kind: ServiceAccount
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -16,9 +16,9 @@
 kind: ConfigMap
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -54,9 +54,9 @@
 kind: ClusterRole
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -89,9 +89,9 @@
 kind: ClusterRoleBinding
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -109,9 +109,9 @@
 kind: Service
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
@@ -135,9 +135,9 @@
 kind: Deployment
 metadata:
   name: exdns-k8s-gateway
   labels:
-    helm.sh/chart: k8s-gateway-2.1.0
+    helm.sh/chart: k8s-gateway-2.3.0
     app.kubernetes.io/name: k8s-gateway
     app.kubernetes.io/instance: exdns
     app.kubernetes.io/version: "0.4.0"
     app.kubernetes.io/managed-by: Helm
dnrce commented 8 months ago

Also FWIW, I'm up and running with the latest chart.

StoveCode commented 8 months ago

@dnrce what other software do you use related to your dns?

StoveCode commented 8 months ago

Is someone able to reproduce the issue? I tryed once more and i got the same result.

I completly uninstalled the old instance an installed a fresh one with the values in the reproduce steps

networkop commented 8 months ago

@StoveCode seems to work fine for me with your helm values. I'm not seeing the sync errors

networkop commented 8 months ago

@StoveCode can you share steps to reproduce ?

dnrce commented 8 months ago

@dnrce what other software do you use related to your dns?

Vanilla CoreDNS for cluster DNS, and k8s_gateway for external queries. Cilium as CNI. My k8s_gateway values look like this:

domain: "{{ join " " .Values.dns.domains }}"
ttl: {{ .Values.dns.ttl }}
fallthrough:
  enabled: true
extraZonePlugins:
  - name: log
  - name: errors
  # Serves a /health endpoint on :8080, required for livenessProbe
  - name: health
    configBlock: |-
      lameduck 5s
  # Serves a /ready endpoint on :8181, required for readinessProbe
  - name: ready
  # Serves a /metrics endpoint on :9153, required for serviceMonitor
  - name: prometheus
    parameters: 0.0.0.0:9153
  - name: forward
    parameters: ". {{ join " " .Values.dns.fallbackResolvers }}"
  - name: loop
  - name: reload
  - name: loadbalance
debug:
  enabled: {{ .Values.dns.debug }}
service:
  annotations:
    my-annotation: my-value
  labels:
    my-label: my-value
StoveCode commented 8 months ago

Hello everyone! :) @dnrce @networkop I have solved the problem the issue was that I was using the V1 Standard Channel instead of the V1 Experimental Channel. The Standard CRDs unfortunately does not have the GRPC routes, so they cannot be resolved. Because this CRD does not yet exist, the error described above also occurs. After I installed the Experimental CRDs everything worked fine.

The Command to fix the issue:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/experimental-install.yaml
networkop commented 8 months ago

@StoveCode if you have time, would you mind doing a quick PR do update the readme?

StoveCode commented 8 months ago

@networkop done