traefik / mesh

Traefik Mesh - Simpler Service Mesh
https://traefik.io/traefik-mesh
Apache License 2.0
2.02k stars 141 forks source link

CoreDNS ConfigMap patch - upstream keyword in EKS with CoreDNS 1.7.0 #788

Closed lescactus closed 3 years ago

lescactus commented 3 years ago

Bug Report

When deploying the latest version of Maesh with Helm on a AWS EKS cluster (1.18), the controller is still patching CoreDNS Corefile with the upstream keyword, despite CoreDNS version > 1.7

The CoreDNS image deployed by EKS is 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1

What did you do?

Installed Maesh v1.4.1 with the official Helm chart traefik-mesh-3.0.6 on a fresh AWS EKS cluster version v1.18

What did you expect to see?

The controller not patching the CoreDNS ConfigMap with the upstream keyword since the CoreDNS version is >1.7 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1

What did you see instead?

The CoreDNS ConfigMap being patched by the controller init container with the upstream keyword, thus leading to CoreDNS crashing with the following error message: Error during parsing: unknown property 'upstream'

Output of controller log: (What version of Traefik Mesh are you using?)

Maesh version v1.4.1

$ kubectl logs -c traefik-mesh-prepare traefik-mesh-controller-54b79ddc9f-47h6v
{"level":"debug","msg":"Starting prepare...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Using masterURL: \"\"","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Using kubeconfig: \"\"","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Creating in-cluster client","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Building Kubernetes Client...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Building SMI Access Client...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Building SMI Specs Client...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Building SMI Split Client...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"ACL mode enabled: false","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Detecting DNS provider...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Checking if CoreDNS is installed in namespace \"kube-system\"...","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"CoreDNS \"1.7.0-eksbuild.1\" has been detected","time":"2021-02-09T18:56:34Z"}
{"level":"debug","msg":"Patching ConfigMap \"coredns\" in namespace \"kube-system\"...","time":"2021-02-09T18:56:34Z"}
{"level":"info","msg":"CoreDNS ConfigMap \"coredns\" in namespace \"kube-system\" has successfully been patched","time":"2021-02-09T18:56:34Z"}
{"level":"info","msg":"Restarting \"coredns\" pods","time":"2021-02-09T18:56:34Z"}

What is your environment & configuration (arguments, provider, platform, ...)?

AWS EKS v1.18 CoreDNS v1.7.0-eksbuild.1

If applicable, please paste the yaml objects required to reproduce your issue

```yml --- apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "4" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"coredns","k8s-app":"kube-dns","kubernetes.io/name":"CoreDNS"},"name":"coredns","namespace":"kube-system"},"spec":{"replicas":2,"selector":{"matchLabels":{"eks.amazonaws.com/component":"coredns","k8s-app":"kube-dns"}},"strategy":{"rollingUpdate":{"maxUnavailable":1},"type":"RollingUpdate"},"template":{"metadata":{"annotations":{"eks.amazonaws.com/compute-type":"ec2"},"labels":{"eks.amazonaws.com/component":"coredns","k8s-app":"kube-dns"}},"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"beta.kubernetes.io/os","operator":"In","values":["linux"]},{"key":"beta.kubernetes.io/arch","operator":"In","values":["amd64","arm64"]}]}]}},"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"k8s-app","operator":"In","values":["kube-dns"]}]},"topologyKey":"kubernetes.io/hostname"},"weight":100}]}},"containers":[{"args":["-conf","/etc/coredns/Corefile"],"image":"602401143452.dkr.ecr.ap-south-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1","imagePullPolicy":"IfNotPresent","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/health","port":8080,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"coredns","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"},{"containerPort":9153,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"httpGet":{"path":"/health","port":8080,"scheme":"HTTP"}},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["all"]},"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/etc/coredns","name":"config-volume","readOnly":true},{"mountPath":"/tmp","name":"tmp"}]}],"dnsPolicy":"Default","priorityClassName":"system-cluster-critical","serviceAccountName":"coredns","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"key":"CriticalAddonsOnly","operator":"Exists"}],"volumes":[{"emptyDir":{},"name":"tmp"},{"configMap":{"items":[{"key":"Corefile","path":"Corefile"}],"name":"coredns"},"name":"config-volume"}]}}}} creationTimestamp: "2021-02-09T17:56:29Z" generation: 4 labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns kubernetes.io/name: CoreDNS managedFields: - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:labels: .: {} f:eks.amazonaws.com/component: {} f:k8s-app: {} f:kubernetes.io/name: {} f:spec: f:progressDeadlineSeconds: {} f:replicas: {} f:revisionHistoryLimit: {} f:selector: f:matchLabels: .: {} f:eks.amazonaws.com/component: {} f:k8s-app: {} f:strategy: f:rollingUpdate: .: {} f:maxSurge: {} f:maxUnavailable: {} f:type: {} f:template: f:metadata: f:annotations: .: {} f:eks.amazonaws.com/compute-type: {} f:labels: .: {} f:eks.amazonaws.com/component: {} f:k8s-app: {} f:spec: f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: .: {} f:nodeSelectorTerms: {} f:podAntiAffinity: .: {} f:preferredDuringSchedulingIgnoredDuringExecution: {} f:containers: k:{"name":"coredns"}: .: {} f:args: {} f:image: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:ports: .: {} k:{"containerPort":53,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} k:{"containerPort":53,"protocol":"UDP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} k:{"containerPort":9153,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:readinessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:resources: .: {} f:limits: .: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:securityContext: .: {} f:allowPrivilegeEscalation: {} f:capabilities: .: {} f:add: {} f:drop: {} f:readOnlyRootFilesystem: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/etc/coredns"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/tmp"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:priorityClassName: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:serviceAccount: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"config-volume"}: .: {} f:configMap: .: {} f:defaultMode: {} f:items: {} f:name: {} f:name: {} k:{"name":"tmp"}: .: {} f:emptyDir: {} f:name: {} manager: kubectl operation: Update time: "2021-02-09T17:56:29Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:deployment.kubernetes.io/revision: {} f:status: f:availableReplicas: {} f:conditions: .: {} k:{"type":"Available"}: .: {} f:lastTransitionTime: {} f:lastUpdateTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Progressing"}: .: {} f:lastTransitionTime: {} f:lastUpdateTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} f:observedGeneration: {} f:readyReplicas: {} f:replicas: {} f:unavailableReplicas: {} f:updatedReplicas: {} manager: kube-controller-manager operation: Update time: "2021-02-09T18:56:34Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:template: f:metadata: f:annotations: f:traefik-mesh-hash: {} manager: traefik-mesh operation: Update time: "2021-02-09T18:56:34Z" name: coredns namespace: kube-system resourceVersion: "13656" selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns uid: 59d0a317-6cbd-4325-96f5-bf47c2327b59 spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: eks.amazonaws.com/component: coredns k8s-app: kube-dns strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 1 type: RollingUpdate template: metadata: annotations: eks.amazonaws.com/compute-type: ec2 traefik-mesh-hash: 813eef77-c814-4448-912c-2cc050fd86d1 creationTimestamp: null labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/os operator: In values: - linux - key: beta.kubernetes.io/arch operator: In values: - amd64 - arm64 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: k8s-app operator: In values: - kube-dns topologyKey: kubernetes.io/hostname weight: 100 containers: - args: - -conf - /etc/coredns/Corefile image: 602401143452.dkr.ecr.ap-south-1.amazonaws.com/eks/coredns:v1.7.0-eksbuild.1 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 5 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: coredns ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP - containerPort: 9153 name: metrics protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8080 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: memory: 170Mi requests: cpu: 100m memory: 70Mi securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_BIND_SERVICE drop: - all readOnlyRootFilesystem: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/coredns name: config-volume readOnly: true - mountPath: /tmp name: tmp dnsPolicy: Default priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: coredns serviceAccountName: coredns terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master - key: CriticalAddonsOnly operator: Exists volumes: - emptyDir: {} name: tmp - configMap: defaultMode: 420 items: - key: Corefile path: Corefile name: coredns name: config-volume --- apiVersion: v1 data: Corefile: | .:53 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } #### Begin Maesh Block maesh:53 { errors rewrite continue { name regex ([a-zA-Z0-9-_]*)\.([a-zv0-9-_]*)\.maesh maesh-{1}-6d61657368-{2}.maesh.svc.cluster.local answer name maesh-([a-zA-Z0-9-_]*)-6d61657368-([a-zA-Z0-9-_]*)\.maesh\.svc\.cluster\.local {1}.{2}.maesh } kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } forward . /etc/resolv.conf cache 30 loop reload loadbalance } #### End Maesh Block #### Begin Traefik Mesh Block traefik.mesh:53 { errors rewrite continue { name regex ([a-zA-Z0-9-_]*)\.([a-zv0-9-_]*)\.traefik.mesh maesh-{1}-6d61657368-{2}.maesh.svc.cluster.local answer name maesh-([a-zA-Z0-9-_]*)-6d61657368-([a-zA-Z0-9-_]*)\.maesh\.svc\.cluster\.local {1}.{2}.traefik.mesh } kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } forward . /etc/resolv.conf cache 30 loop reload loadbalance } #### End Traefik Mesh Block kind: ConfigMap metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . /etc/resolv.conf\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"coredns","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}} creationTimestamp: "2021-02-09T17:56:29Z" labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: {} f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:labels: .: {} f:eks.amazonaws.com/component: {} f:k8s-app: {} manager: kubectl operation: Update time: "2021-02-09T17:56:29Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: f:Corefile: {} manager: traefik-mesh operation: Update time: "2021-02-09T18:56:34Z" name: coredns namespace: kube-system resourceVersion: "13620" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: 6cec9ad9-6a92-4d13-a978-083e39484f66 ```
renatomjr commented 3 years ago

Same here. This makes it impossible to use the Traefik Mesh in eks

0rax commented 3 years ago

I think I just found the issue which was "introduced" by previous PR #774 when I added support for build version with suffixes. It seems like goversion considers v1.7.0-eksbuild.1 as a pre-release version of v1.7.0 while it is not.

versionCoreDNS17 := goversion.Must(goversion.NewVersion("v1.7"))
coreDNSVersion := goversion.Must(goversion.NewVersion("v1.7.0-eksbuild.1"))
fmt.Println(coreDNSVersion.LessThan(versionCoreDNS17)) // Prints true

I will submit a patch to make sure this is fixed while I am looking at adding support for CoreDNS 1.8.


EDIT: It seems like there is two solutions:

This issue seems to be something to consider more broadly during the version check in prepare as it might consider v1.8.0-eksbuild.0 as a pre-release and satisfy the LessThan check for the build version.

kevinpollet commented 3 years ago

Closed by https://github.com/traefik/mesh/pull/790