ExternalDNS deleting and then creating records. Constantly. Infoblox

gennady-voronkov commented 3 years ago

What happened: ExternalDNS deleting and then creating records. Constantly. Infoblox

What you expected to happen: it should be add once, if ingress resource delete from k8s, it should be deleted from dns then.

How to reproduce it (as minimally and precisely as possible): I used next args: interval: "1m" logLevel: debug logFormat: text policy: upsert-only registry: "txt" txtPrefix: "ing" txtSuffix: "" txtOwnerId: "kcc-ing"

Anything else we need to know?:

Environment:

External-DNS version (use external-dns --version): 0.8.0 it reprodicible on early version as well
DNS provider: infoblox
Others: logs: time="2021-08-05T08:42:31Z" level=debug msg="Endpoints generated from ingress: test/demo: [demo.test..com 0 IN A 10.10.10.10 [] demo.test..com 0 IN A 10.10.10.10 []]" time="2021-08-05T08:42:31Z" level=debug msg="Removing duplicate endpoint demo.test.***.com 0 IN A 10.10.10.10 []"

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

arvin-a commented 2 years ago

You most likely have ttl annotation set in your service which infoblox provider does not support external-dns.alpha.kubernetes.io/ttl

Can you check if its there?

senk commented 2 years ago

/remove-lifecycle stale

senk commented 2 years ago

Same problem here. No external-dns.alpha.kubernetes.io/ttl present

vkruoso commented 2 years ago

This was happening to me using DigitalOcean provider. Removing the external-dns.alpha.kubernetes.io/ttl seems to have an effect on that. Probably the diff check is not considering the TTL field? If the contributors could point to the right direction here on how to fix it we may be able to get a PR going.

Seems like this is the same issue as #1421 and #1959.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mcwumbly commented 2 years ago

Haven't yet been able to dig too much deeper, but I'm hearing reports of an issue that match what is reported here using versions 0.8.0 and 0.10.0. No ttl annotation present here either.

/remove-lifecycle stale

alebedev87 commented 2 years ago

@gennady-voronkov: I managed to reproduce the log message you mentioned about the removal of the duplicate. And I found that it appears only when there is another source with the same "key" which is made of DNS name, SetIdentifier and Targets.

The way the ExternalDNS collects the endpoints from Kubernetes is made of 2 steps:

first is the collection from all the sources: multiSource structure
second is the deduplication: dedupSource structure

And both of them are used together in main.go.

Now the question is how I managed to get 2 sources with the same "key". I used Red Hat OpenShift which makes a mirroring of any (only default ingress class actually) Ingress resource to OpenShift Route resource. So, using both (openshift-route and ingress) sources I saw 2 endpoints with the same "key":

time="2022-03-31T19:40:12Z" level=debug msg="Endpoints generated from OpenShift Route: openshift-operator-lifecycle-manager/demo-f7fkj: [demo.dronskm.io 0 IN CNAME  router-default.apps-crc.testing []]"
...
time="2022-03-31T19:40:12Z" level=debug msg="Endpoints generated from ingress: openshift-operator-lifecycle-manager/demo: [demo.dronskm.io 0 IN CNAME  router-default.apps-crc.testing []]"
time="2022-03-31T19:40:12Z" level=debug msg="Removing duplicate endpoint demo.dronskm.io 0 IN CNAME  router-default.apps-crc.testing []"

As you can see the "keys" (everything in between []) for both endpoints are the same regardless they are from different sources.

After descoping the ingress from the mirroring I ended up with the single endpoint coming from the ingress and the log message disappeared:

...
time="2022-03-31T19:41:12Z" level=debug msg="Endpoints generated from OpenShift Route: openshift-console/console: [console-openshift-console.apps-crc.testing 0 IN CNAME  router-default.apps-crc.testing []]"
time="2022-03-31T19:41:12Z" level=debug msg="Endpoints generated from ingress: openshift-operator-lifecycle-manager/demo: [demo.dronskm.io 0 IN CNAME  router-default.apps-crc.testing []]"
time="2022-03-31T19:41:12Z" level=debug msg="ignoring record downloads-openshift-console.apps-crc.testing that does not match domain filter"

So, ExternalDNS made the right decision to remove a duplicate. Note also that the CNAME DNS record wasn't removed in Infoblox, it remained the same since the first time I ran ExternalDNS on my cluster. The log message is about the removal of the duplicate not about the removal of the DNS record.

You should check if you don't have a similar situation of multiple endpoints with the same key. I think it can happen quite easily: same hostname annotation on different services/ingresses, same host on different ingresses, OpenShift, etc.

Moezenka commented 2 years ago

I am experiencing the same problem, records are being constantly deleted and recreated.

Container Arguments:

        - --source=service
        - --source=ingress
        - --domain-filter=app.acme.local
        - --provider=infoblox
        - --infoblox-grid-host=XX.XX.XX.XX
        - --infoblox-wapi-port=443
        - --infoblox-wapi-version=2.3.1
        - --log-level=debug
        - --txt-owner-id=clu06
        - --no-infoblox-ssl-verify
        - --registry=txt
        - --events
        - --policy=sync

I am able to reproduce this issue by running version 0.8.0 I am running several clusters and have tagged all clusters with a unique txt-owner-id.

Here are the logs for version: 0.8.0 (sample output)

time="2022-04-XXT13:03:48Z" level=info msg="Deleting A record named 'app-one.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:49Z" level=info msg="Deleting A record named 'app-two.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:49Z" level=info msg="Deleting A record named 'app-three.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:50Z" level=info msg="Deleting TXT record named 'app-one.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:50Z" level=info msg="Deleting TXT record named 'app-two.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:50Z" level=info msg="Deleting TXT record named 'app-three.app.acme.local' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:50Z" level=info msg="Creating A record named 'app-one.app.acme.local' to 'XX.XX.XX.XX' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:51Z" level=info msg="Creating A record named 'app-two.app.acme.local' to 'XX.XX.XX.XX' for Infoblox DNS zone 'app.acme.local'."
time="2022-04-XXT13:03:51Z" level=info msg="Creating A record named 'app-three.app.acme.local' to 'XX.XX.XX.XX' for Infoblox DNS zone 'app.acme.local'."

Here are the logs for version: 0.7.4

time="2022-04-19T08:47:10Z" level=debug msg="Skipping endpoint 5fcfd54d7f-lnl2c.app.acme.local 0 IN A  172.29.10.7 [] because owner id does not match, found: \"\", required: \"clu06\""
time="2022-04-19T08:47:10Z" level=debug msg="Skipping endpoint 55kh5.app.acme.local 0 IN A  172.29.10.11 [] because owner id does not match, found: \"\", required: \"clu06\""
time="2022-04-19T08:47:10Z" level=debug msg="Skipping endpoint iodnfoi44.app.acme.local 0 IN A  172.29.10.130 [] because owner id does not match, found: \"\", required: \"clu06\""

Version 0.7.4 fetches records, occassionally removes a duplicate endpoint but mostly skips records where as version 0.8.0 constantly flaps and deletes and creates records.

Duplicate issue:

2057
1933

EDIT: same behaviour in v0.11.0

xxated commented 2 years ago

Any updates here? Experiencing the same issue.

ranjishmp commented 2 years ago

We are able to see the same issue in our internal testing. @skudriavtsev is looking into this and a possible fix

ranjishmp commented 2 years ago

@skudriavtsev raised a PR for the same - https://github.com/kubernetes-sigs/external-dns/pull/2755 which is under review

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/external-dns/issues/2198#issuecomment-1273544480): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / external-dns

ExternalDNS deleting and then creating records. Constantly. Infoblox #2198

2057

1933