kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.65k stars 2.56k forks source link

External DNS keeps upserting - v0.13.6 #3977

Open jgournet opened 1 year ago

jgournet commented 1 year ago

What happened: When upgrading from v0.13.5 to v0.13.6, we're getting every run:

kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Applying provider record filter for domains: [XXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Desired change: CREATE XXX A [Id: /hostedzone/XXXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:39Z" level=info msg="Desired change: CREATE k8s.XXXX TXT [Id: /hostedzone/XXXX]"
kube-system external-dns-78d5f698ff-zc9bz external-dns time="2023-10-08T23:51:40Z" level=info msg="2 record(s) in zone XXXXX [Id: /hostedzone/XXXX] were successfully updated"

What you expected to happen: same behavior as v0.13.5:

kube-system external-dns-9957dffb8-9mkcl external-dns time="2023-10-08T23:59:28Z" level=info msg="All records are already up to date"

How to reproduce it (as minimally and precisely as possible): We're using istio GW with annotation:

kind: Gateway
metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: XXXXX,www.XXXXXX
    external-dns.alpha.kubernetes.io/target: dr-eks-ingress-gw-XXXXXX.amazonaws.com.

Anything else we need to know?: Works fine in v0.13.5

Environment:

Similar issues: https://github.com/kubernetes-sigs/external-dns/issues/1421 https://github.com/kubernetes-sigs/external-dns/issues/1959

fbarrerafalabella commented 11 months ago

This happends to me on the same version when going from 0.12.2 to 0.13.6. just on 2 records related to 1 app on my k8s cluster, dont know If it is because this record is the root decord for example the zone is example.com and the record is example.com and also external-dns.example.com. it constantly keeps updating it to the same value

jgournet commented 11 months ago

Weird addition: we now have a few clusters that run 0.13.6 without any issues at all ...

gustav-b commented 11 months ago

Still reproducible for me with v0.14.0 – same two Route53 records in each zone are update repeatedly:

time="2023-11-08T20:47:30Z" level=info msg="Created Kubernetes client https://100.64.0.1:443"
time="2023-11-08T20:47:31Z" level=info msg="Applying provider record filter for domains: [yyy. .yyy. zzz. .zzz.]"
time="2023-11-08T20:47:31Z" level=info msg="Desired change: UPSERT yyy A [Id: /hostedzone/YYY]"
time="2023-11-08T20:47:31Z" level=info msg="Desired change: UPSERT external-dns.yyy TXT [Id: /hostedzone/YYY]"
time="2023-11-08T20:47:32Z" level=info msg="2 record(s) in zone yyy. [Id: /hostedzone/YYY] were successfully updated"
time="2023-11-08T20:47:33Z" level=info msg="Desired change: UPSERT zzz A [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:47:33Z" level=info msg="Desired change: UPSERT external-dns.zzz TXT [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:47:33Z" level=info msg="2 record(s) in zone zzz. [Id: /hostedzone/ZZZ] were successfully updated"

time="2023-11-08T20:48:31Z" level=info msg="Applying provider record filter for domains: [yyy. .yyy. zzz. .zzz.]"
time="2023-11-08T20:48:31Z" level=info msg="Desired change: UPSERT zzz A [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:48:31Z" level=info msg="Desired change: UPSERT external-dns.zzz TXT [Id: /hostedzone/ZZZ]"
time="2023-11-08T20:48:31Z" level=info msg="2 record(s) in zone zzz. [Id: /hostedzone/ZZZ] were successfully updated"
time="2023-11-08T20:48:32Z" level=info msg="Desired change: UPSERT yyy A [Id: /hostedzone/YYY]"
time="2023-11-08T20:48:32Z" level=info msg="Desired change: UPSERT external-dns.yyy TXT [Id: /hostedzone/YYY]"
time="2023-11-08T20:48:33Z" level=info msg="2 record(s) in zone yyy. [Id: /hostedzone/YYY] were successfully updated"
ElvenSpellmaker commented 11 months ago

I get exactly the same thing, but only for the root record which concurs with @fbarrerafalabella. Other apps using sub-domains don't constantly UPSERT.

This happens with both 0.13.6 and 0.14.0

EDIT: Sanitised logs:

time="2023-11-13T19:29:21Z" level=debug msg="Refreshing zones list cache"
time="2023-11-13T19:29:22Z" level=debug msg="Considering zone: /hostedzone/ID (domain: foo.com.)"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service cert-manager-dns/cert-manager"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service external-dns/external-dns"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service ingress-nginx/ingress-nginx-controller"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service ingress-nginx/ingress-nginx-controller-admission"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service cert-manager-dns/cert-manager-webhook"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/apple-service"
time="2023-11-13T19:29:22Z" level=debug msg="No endpoints could be generated from service default/banana-service"
time="2023-11-13T19:29:22Z" level=debug msg="Endpoints generated from ingress: default/apple-ingress: [foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [] foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []]"
time="2023-11-13T19:29:22Z" level=debug msg="Endpoints generated from ingress: default/banana-ingress: [banana.foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [] foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []]"
time="2023-11-13T19:29:22Z" level=debug msg="Removing duplicate endpoint foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []"
time="2023-11-13T19:29:22Z" level=debug msg="Removing duplicate endpoint foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com []"
time="2023-11-13T19:29:22Z" level=debug msg="Modifying endpoint: foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [], setting alias=true"
time="2023-11-13T19:29:22Z" level=debug msg="Modifying endpoint: banana.foo.com 0 IN CNAME  foo.elb.eu-west-2.amazonaws.com [], setting alias=true"
time="2023-11-13T19:29:22Z" level=debug msg="Refreshing zones list cache"
time="2023-11-13T19:29:23Z" level=debug msg="Considering zone: /hostedzone/ID (domain: foo.com.)"
time="2023-11-13T19:29:23Z" level=debug msg="Adding foo.com. to zone foo.com. [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=debug msg="Adding foo.com. to zone foo.com. [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=debug msg="Skipping record {\n  Action: \"UPSERT\",\n  ResourceRecordSet: {\n    Name: \"cname-foo.com\",\n    ResourceRecords: [{\n        Value: \"\\\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/apple-ingress\\\"\"\n      }],\n    TTL: 300,\n    Type: \"TXT\"\n  }\n} because no hosted zone matching record DNS Name was detected"
time="2023-11-13T19:29:23Z" level=info msg="Desired change: UPSERT foo.com A [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=info msg="Desired change: UPSERT foo.com TXT [Id: /hostedzone/ID]"
time="2023-11-13T19:29:23Z" level=info msg="2 record(s) in zone foo.com. [Id: /hostedzone/ID] were successfully updated"

EDIT2: I have a suspicion it's because it can't set the "root" cname record it tries to set as it doesn't control that domain:

time="2023-11-13T19:29:23Z" level=debug msg="Skipping record {\n  Action: \"UPSERT\",\n  ResourceRecordSet: {\n    Name: \"cname-foo.com\",\n    ResourceRecords: [{\n        Value: \"\\\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/default/apple-ingress\\\"\"\n      }],\n    TTL: 300,\n    Type: \"TXT\"\n  }\n} because no hosted zone matching record DNS Name was detected"
Jayd603 commented 10 months ago

Same here with 0.14.0 and digital ocean. level=warning msg="Updating existing target" on every single run , once per minute, i had some settings where it said "records already up to date" so I would need to debug further.

my current settings:

yuriipolishchuk commented 10 months ago

I had the same issue after upgrading to v0.13.6 with a single ingress that has a host in a root domain, i.e.

spec:
  rules:
    - host: myrootdomain.tld
    ...

    - host: '*.myrootdomain.tld'
    ...

Fixed by removing a rule for the root domain from ingress

Jayd603 commented 10 months ago

I had the same issue after upgrading to v0.13.6 with a single ingress that has a host in a root domain, i.e.

spec:
  rules:
    - host: myrootdomain.tld
    ...

    - host: '*.myrootdomain.tld'
    ...

Fixed by removing a rule for the root domain from ingress

In my case it still does it even with a single basic service entry. Updates every single time even when not necessary. I have a sub domain like cluster1.do.domain.com defined in a single place and that's it.

hobti01 commented 7 months ago

We see the same issue with 0.14.0 on Route53 with a A record of type Alias with the same name as the domain root.

time="2024-03-05T17:53:18Z" level=info msg="Applying provider record filter for domains: [the.domain.tld.]"
time="2024-03-05T17:53:19Z" level=info msg="Desired change: UPSERT _externaldns.the.domain.tld TXT [Id: /hostedzone/ZXXX]"
time="2024-03-05T17:53:19Z" level=info msg="Desired change: UPSERT the.domain.tld A [Id: /hostedzone/ZXXX]"
time="2024-03-05T17:53:19Z" level=info msg="2 record(s) in zone the.domain.tld. [Id: /hostedzone/ZXXX] were successfully updated"

I manually removed the TXT record and there's no attempt to update (since the ownership is removed). Not a solution, but a workaround to stop the upserts.

time="2024-03-05T17:54:20Z" level=info msg="Applying provider record filter for domains: [the.domain.tld.]"
time="2024-03-05T17:54:20Z" level=info msg="All records are already up to date"

However the other thing we notice is that the "new" TXT record is not being created, only the old one: we have record _externaldns.the.domin.tld but do NOT have _externaldns.cname-the.domain.tld which we would expect do to https://github.com/kubernetes-sigs/external-dns/blob/d2890b0a71c5c991c8c9e56f4108c17b8914cf64/registry/txt.go#L229-L232

fbarrerafalabella commented 7 months ago

is there a solution or an explanation for this? it keeps happening on newer releases

clesquere commented 7 months ago

Same error on my side too, only for a root record, subdomains records are working fine.

jgournet commented 6 months ago

@linki or @stevehipwell : would someone be able to check what is happening with this issue please ? We had to force version 0.13.5, as anything above keeps on upserting records (which generates alerts, as a matter of facts). thanks for your help

stevehipwell commented 6 months ago

@jgournet sorry but I'm not in a position to help with this, I'm the Helm chart maintainer but I'm only superficially familiar with the actual code here.

sydorovdmytro commented 5 months ago

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

life5ign commented 4 months ago

Same issue with version 0.14.1 in AWS with subdomains. Installed with helm chart external-dns-7.3.0, in AWS, with currently internet-facing NLBs; but I've seen it in a private hosted zone with internal NLBs too.

Strange thing was that it didn't do it for a while, and then just suddenly started continual UPSERTs for already existing records.

stevehipwell commented 4 months ago

@life5ign I don't think external-dns-7.3.0 is the official Helm chart.

life5ign commented 4 months ago

@stevehipwell you're right; I'm using https://artifacthub.io/packages/helm/bitnami/external-dns

I'll try the official https://artifacthub.io/packages/helm/external-dns/external-dns

gustav-b commented 4 months ago

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

This work around solved the issue for me too. Now on v0.14.2 without any continuous upserts.

life5ign commented 4 months ago

We had the same issue with v0.13.6 and AWS Route 53.

Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

@sydorovdmytro thanks, where did you get this idea? EDIT nevermind:

docker run -it --rm bitnami/external-dns:latest --help | grep txt-prefix

only place I could find the documentation on the available CLI flags

ElvenSpellmaker commented 4 months ago

These workarounds are just that. Workarounds. It's still an issue tbh.

life5ign commented 4 months ago

We had the same issue with v0.13.6 and AWS Route 53. Fixed by adding %{record_type} for the txt-prefix, something like --txt-prefix=%{record_type}_external-dns.

This work around solved the issue for me too. Now on v0.14.2 without any continuous upserts.

This worked for me once I sorted out some ingress provider ingressclassname issues in my cluster

I also decided to set --aws-prefer-cname to switch up the type of record away from A Alias proprietary type, at the same time.

matthijswolters-rl commented 1 month ago

I am also struggling with this issue. I have tried the workarounds of setting --txt-prefix=%{record_type}-record-, and I have also tried setting the --txt-cache-interval=1h but then it just upserts after the interval. This is on v0.14.2