Closed PirateBread closed 4 years ago
We're seeing the same behaviour on GKE (Google).
What version of external are you're currently running?
Seems related to #879
v0.5.10 has the problem, we have reverted to v0.5.9 which does not.
Exactly the same here. v0.5.9 works fine, v0.5.10 does this constantly.
We are having the same issue, I posted an example in #543 We will try to revert to v0.5.9 for now.
I've had the same issue this morning. Thankfully you guys already reported this as I was aware of the loop but did not know the cause... I've also reverted to v0.5.9 (running AKS 1.11.3 in Azure by the way)
yep same issue here, we were saved by keeping a lock on our resource groups in azure for delete :)
We are facing the same issue starting 0.5.10. 0.5.9 works fines
Same issue, but only on 0.5.10, reverting to 0.5.9 works perfectly fine:
The following loop it's happening every minute. Logs from external-dns (debug level):
level=debug msg="Retrieving Azure DNS zones."
level=debug msg="Found 1 Azure DNS zone(s)."
level=debug msg="Retrieving Azure DNS records for zone 'fulldomain.com'."
level=debug msg="Found A record for 'test-app.fulldomain.com' with target 'XX.XX.XX.XX'."
level=debug msg="Found TXT record for 'test-app.fulldomain.com' with target '\"heritage=external-dns,external-dns/owner=prod,external-dns/resource=ingress/test-app/test-app\"'."
level=debug msg="Endpoints generated from ingress: test-app/test-app: [test-app.fulldomain.com 300 IN A XX.XX.XX.XX [] test-app.fulldomain.com 300 IN A XX.XX.XX.XX []]"
level=debug msg="Removing duplicate endpoint test-app.fulldomain.com 300 IN A XX.XX.XX.XX []"
level=debug msg="Retrieving Azure DNS zones."
level=debug msg="Found 1 Azure DNS zone(s)."
level=info msg="Deleting A record named 'test-app' for Azure DNS zone 'fulldomain.com'."
level=info msg="Deleting TXT record named 'test-app' for Azure DNS zone 'fulldomain.com'."
level=info msg="Updating A record named 'test-app' to 'XX.XX.XX.XX' for Azure DNS zone 'fulldomain.com'."
level=info msg="Updating TXT record named 'test-app' to '\"heritage=external-dns,external-dns/owner=prod,external-dns/resource=ingress/test-app/test-app\"' for Azure DNS zone 'fulldomain.com'."
Thanks for all the other reports. I tried to downgrade to 0.5.9 and in Azure I'm now getting an API version error.
I then tried 0.5.8, same problem. Went back to 0.5.10, same problem.
I'm really confused now because up until 10 minutes ago, my External DNS was running the :latest tag and was constantly recycling DNS records.
I deleted that deployment (kubectl delete -f external-dns-manifest.yaml), and then created it. And now for some reason I'm getting API errors.
Wondering if somehow Azure is rate limiting these requests which just coincided with me trying to downgrade?
level=error msg="dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"InvalidApiVersionParameter\" Message=\"The api-version '2016-04-01' is invalid. The supported versions are '2018-11-01,2018-09-01,2018-08-01,2018-07-01,2018-06-01,2018-05-01,2018-02-01,2018-01-01,2017-12-01,2017-08-01,2017-06-01,2017-05-10,2017-05-01,2017-03-01,2016-09-01,2016-07-01,2016-06-01,2016-02-01,2015-11-01,2015-01-01,2014-04-01-preview,2014-04-01,2014-01-01,2013-03-01,2014-02-26,2014-04'.\""
@PirateBread
Could you try this build for Azure to see if it addresses your issue?
registry.opensource.zalan.do/teapot/external-dns:v0.5.10-16-gfe39b46
@jhohertz
Just deployed v0.5.10-16-gfe39b46
and I'm still seeing the following:
time="2019-02-08T16:05:52Z" level=info msg="Created Kubernetes client https://xxxxx-2b0c5b7a.hcp.uksouth.azmk8s.io:443" time="2019-02-08T16:05:52Z" level=info msg="Using client_id+client_secret to retrieve access token for Azure API." time="2019-02-08T16:05:52Z" level=error msg="dns.ZonesClient#time="2019-02-08T16:05:52Z" level=info msg="Created Kubernetes client https://xxxxxxx-2b0c5b7a.hcp.uksouth.azmk8s.io:443" time="2019-02-08T16:05:52Z" level=info msg="Using client_id+client_secret to retrieve access token for Azure API." time="2019-02-08T16:05:52Z" level=error msg="dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"InvalidApiVersionParameter\" Message=\"The api-version '2016-04-01' is invalid. The supported versions are '2018-11-01,2018-09-01,2018-08-01,2018-07-01,2018-06-01,2018-05-01,2018-02-01,2018-01-01,2017-12-01,2017-08-01,2017-06-01,2017-05-10,2017-05-01,2017-03-01,2016-09-01,2016-07-01,2016-06-01,2016-02-01,2015-11-01,2015-01-01,2014-04-01-preview,2014-04-01,2014-01-01,2013-03-01,2014-02-26,2014-04'.\"":
If I get a chance this weekend I'm going to try and reproduce this in a completely fresh environment in my own subscription to rule out some kind of configuration issue but at this point I can't see what would be wrong?
I can confirm that v0.5.10-16-gfe39b46 solves the eternal delete/update loop of doom on GKE.
Thanks for the feedback, we will work on an official release which will probably land tomorrow.
I have similar problem but on AWS with version 0.5.11. ExternalDNS is constantly updating same record every two minutes (--interval=2m)
time="2019-02-19T14:21:45Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: af6f41c7-3451-11e9-bb90-1939f5de72e5"
time="2019-02-19T14:21:52Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: b3bb1bbc-3451-11e9-92a8-118f2457694e"
time="2019-02-19T14:22:10Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:22:10Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:22:10Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"
time="2019-02-19T14:24:06Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:24:06Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:24:06Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"
time="2019-02-19T14:26:25Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: 5676a7c3-3452-11e9-b59c-ddd6f4af4826"
time="2019-02-19T14:26:25Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:26:25Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:26:25Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"
My arguments:
--log-level=info
--policy=upsert-only
--provider=aws
--registry=txt
--interval=2m
--source=service
Also same behavior on 0.5.9.
I have the same issue as @omegarus.
I'm not seeing the needless updates on AWS as others are experiencing, but one difference may be that I don't have any cases of trying to publish wildcard DNS records, so I am wondering if the issue is somewhat specific to the wildcard?
@jhohertz The DNS records I'm trying to publish don't contain wildcards, they are configured for different ingresses that contain different service host names (for ex. service.internal.domain, app.internal.domain), and I'm still experiencing this issue (I've tried to downgrade as far as v0.5.7 and it still happens).
I'm sorry @FridaGo I'm not sure what you're experiencing. This issue and the ones I have recently posted about are all relating to a problem that was introduced in v0.5.10.
All I can suggest is try watching the status field of the services you are attaching the DNS records to, to see if something is causing updates you aren't expecting to that status, which external-dns might be picking up on. I've seen some ingress configurations cause things like that to occur.
Can we close this issue as v0.5.11 was released?
@jhohertz Status field is constant and not changing.
status:
loadBalancer:
ingress:
- hostname: x8076o593986511e9b2dc86r8d247u18-9901230772.us-west-1.elb.amazonaws.com
dnslog.txt I'm seeing this same behavior with infoblox after upgrading from 0.5.9 to 0.5.11. I'm going to try and downgrade to 5.9 to see if it resolves it. So much churn with the recycling bin that it blew up the Infoblox DB. Sample logs attached.
Have the same issue on v0.5.11 on GKE
For me, on AWS, both running v0.5.9 and v0.5.11, haven't seen such a problem. Maybe it has something to do @jhohertz mentioned?
Found a solution to the problem. If you have another externaldns who have the same txt records value, the first externaldns will delete the records of the second and vice versa you should change the value of "txtOwnerId" for each externaldns deployment.
@medanasslim great, thanks for posting an update.
Ping to @PirateBread and @aslimacc , do you have additional info to share and/or are you still experiencing this issue?
Works for me
Experiencing the same issue with Cloudflare and both registry.opensource.zalan.do/teapot/external-dns:v0.5.9
and registry.opensource.zalan.do/teapot/external-dns:v0.5.12
.
...
spec:
containers:
- args:
- --source=ingress
- --domain-filter=my-domain.com
- --provider=cloudflare
- --cloudflare-proxied
env:
- name: CF_API_KEY
value:
- name: CF_API_EMAIL
value:
image: registry.opensource.zalan.do/teapot/external-dns:v0.5.9
imagePullPolicy: Always
...
I am on Cloudflare and as I said above, you should add "txt-owner-id"
Example below:
I am on Cloudflare and as I said above, you should add "txt-owner-id"
Example below:
args:
- --log-level=info
- --registry=txt
- --interval=1m
- --txt-owner-id=instance1
Thank you for the advice but this doesn't fix the issue. This is useful if you have multiple clusters using the same DNS zone.
Can you share your logs, please to see the behavior of the app?
On Mon, Apr 22, 2019 at 5:21 PM Jérôme Lecorvaisier < notifications@github.com> wrote:
I am on Cloudflare and as I said above, you should add "txt-owner-id"
Example below:
-
args:
- --log-level=info
- --registry=txt
- --interval=1m
- --txt-owner-id=instance1
Thank you for the advice but this doesn't fix the issue. This is useful if you have multiple clusters using the same DNS zone.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/external-dns/issues/883#issuecomment-485447952, or mute the thread https://github.com/notifications/unsubscribe-auth/ALK4NGXXQPM55KKTQWPQUJDPRXJY7ANCNFSM4GTSROLA .
Can you share your logs, please to see the behavior of the app? … On Mon, Apr 22, 2019 at 5:21 PM Jérôme Lecorvaisier < @.***> wrote: I am on Cloudflare and as I said above, you should add "txt-owner-id" Example below: - args: - --log-level=info - --registry=txt - --interval=1m - --txt-owner-id=instance1 Thank you for the advice but this doesn't fix the issue. This is useful if you have multiple clusters using the same DNS zone. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#883 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ALK4NGXXQPM55KKTQWPQUJDPRXJY7ANCNFSM4GTSROLA .
Sure, you can see logs here https://github.com/kubernetes-incubator/external-dns/issues/992
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
dnslog.txt I'm seeing this same behavior with infoblox after upgrading from 0.5.9 to 0.5.11. I'm going to try and downgrade to 5.9 to see if it resolves it. So much churn with the recycling bin that it blew up the Infoblox DB. Sample logs attached.
I'm also seeing this with the infoblox provider running v0.5.15. Removing my TTL annotations as per a previous comment resolved this issue.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Hi, sorry to open up this ticket again but I've faced the same issue. Once removed all other sources than istio-gateway the problem ~dissapeared~.
Edit: actually it didn't. I'm investigating it further.
Seeing this as well with Istio gateways and TransIP provider. We do have two instances of external-DNS for the same zone but with different txt-owner-id
so that shouldn't be a problem.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
/remove-lifecycle rotten
/reopen
@Xnyle: You can't reopen an issue/PR unless you authored it or you are a collaborator.
txt-owner-id
works for me
FYI, I had the same problem and as other suggested the issue was that I had two different external dns deployments with the same txt-owner-id
. They were deleting each others records. As a temporary fix I used --policy=upsert-only
This issue is reproducible on infoblox provider as well. it constantly does the same create-delete every minute. Please advise solution?
logs: time="2021-08-05T08:42:31Z" level=debug msg="Endpoints generated from ingress: test/demo: [demo.test..com 0 IN A 10.10.10.10 [] demo.test..com 0 IN A 10.10.10.10 []]" time="2021-08-05T08:42:31Z" level=debug msg="Removing duplicate endpoint demo.test.***.com 0 IN A 10.10.10.10 []"
arg: interval: "1m" logLevel: debug logFormat: text policy: upsert-only registry: "txt" txtPrefix: "ing" txtSuffix: "" txtOwnerId: "kcc-ing"
As you can see below, this is not ideal behaviour.
The logs from the pod just show constantly deleting/updating records. It doesn't have any information as to why it's doing it.
I've checked, my ingress addresses are not disappearing, at least not that I can see.