kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.76k stars 2.59k forks source link

External-DNS in GKE fails to insert Ingress A records when DNS provider is AWS Route 53 since helm chart version 6.28.2 #4707

Open edison-vflow opened 3 months ago

edison-vflow commented 3 months ago

What happened:

We have a GKE cluster running with ExternalDNS chart version 6.23.3 We use AWS Route 53 as DNS provider.

When updating ExternalDNS to latest chart version 8.3.5, there are errors in the ExternalDNS pods

external-dns {"level":"info","msg":"Desired change: CREATE realtime.cluster-prefix.company-domain.com A [Id: /hostedzone/*******]","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"info","msg":"Desired change: CREATE realtime.cluster-prefix.company-domain.com TXT [Id: /hostedzone/*******]","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"error","msg":"Failure in zone company-domain.com. [Id: /hostedzone/*******] when submitting change batch: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone, Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone, status code: 400, request id: 101e36ce-821e-45f5-9ead-a7ab6d0ea373","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 7ad567af-abee-4894-905d-4770f5a708be), it will be retried in a separate change batch in the next iteration","time":"2024-08-23T23:42:53Z"}
external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 5bb64b3b-ed06-4911-8a5e-ec4c006d35c9), it will be retried in a separate change batch in the next iteration","time":"2024-08-23T23:42:53Z"}
external-dns {"level":"error","msg":"Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/*******]","time":"2024-08-23T23:42:57Z"}

From Investigations carried, I can confirm that the issue starts from helm chart version 6.28.2 onwards. So from 6.23.3 to 6.28.1, ExternalDNS is able to add all GKE ingress records to AWS Route 53 as A records correctly

From 6.28.2 to 8.3.5 ExternalDNS fails with the above mentioned error. The error shows us that from version 6.28.2, ExternalDNS is interpreting the GKE loadbalancer IP address as a domain name. It is then trying to add a DNS A record into Route53 with Alias=Yes However, this is not correct and it will fail because the IP address of the GKE loadbalancer is not a domain that is within the hosted zone This attempt to add the IP address of the GKE loadbalancer as a domain in the same hosted zone would not have occurred if ExternalDNS treated the GKE loadbalancer as an A record without an Alias

What you expected to happen:

For the versions that work, we can observe that ExternalDNS is able to correctly determine that the GKE ingress entries must be inserted into Route53 as A records with Alias = NO

i.e since the GKE loadbalancer Route53 is pointing to is an IP address, it should be pointed to directly and not as an alias

image

How to reproduce it (as minimally and precisely as possible):

Happy path

Breaking path

external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 5bb64b3b-ed06-4911-8a5e-ec4c006d35c9), it will be retried in a separate change batch in the next 

Anything else we need to know?:

Environment: GKE , Kubernetes version 1.30

xavidop commented 3 months ago

hi, we are seeing the same issue

stephanpelikan commented 3 months ago

Me too, using Helm chart 8.3.5:

Failure in zone my-hosted-zone.my-company.com. [Id: /hostedzone/*******] when submitting change batch: InvalidChangeBatch: [Tried to create an alias that targets k8s-wordpres-wpdemowo-******.eu-central-1.elb.amazonaws.com., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 6d3b6622-8246-4a79-a338-da642d4158db

@edison-vflow : That you for figuring out that old charts work.

Using an old version will be OK for now (I will see) but it would be great to use the most recent version.

leonardocaylent commented 3 months ago

@edison-vflow Can you test using these versions and share the results with us? 7.0.1 7.0.2

k8s-triage-robot commented 1 day ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale