octodns / octodns-cloudflare

Cloudflare DNS provider for octoDNS
MIT License
24 stars 17 forks source link

Hitting authentication errors when running octodns sync for multiple new zones on the same run #108

Open rlaakkol opened 1 week ago

rlaakkol commented 1 week ago

So we are creating new zones with DNS records in Cloudflare using octodns, and almost every time if there are enough new zones, we hit a octodns_cloudflare.CloudflareAuthenticationError: Authentication error at some point of the process. Usually before the 40th or so zone in the set.

My gut feeling is that this is because octodns tries to create the records into the newly created zone too soon after creation, and Cloudflare throws back a 403. But I have a hard time verifying this, as this happens so sporadically.

Here's the full stacktrace for reference:

Traceback (most recent call last):
  File "env/bin/octodns-sync", line 8, in <module>
    sys.exit(main())
  File "env/lib/python3.10/site-packages/octodns/cmds/sync.py", line 62, in main
    manager.sync(
  File "env/lib/python3.10/site-packages/octodns/manager.py", line 856, in sync
    total_changes += target.apply(plan)
  File "env/lib/python3.10/site-packages/octodns/provider/base.py", line 298, in apply
    self._apply(plan)
  File "env/lib/python3.10/site-packages/octodns_cloudflare/__init__.py", line 1107, in _apply
    getattr(self, f'_apply_{class_name}')(change)
  File "env/lib/python3.10/site-packages/octodns_cloudflare/__init__.py", line 920, in _apply_Create
    self._try_request('POST', path, data=content)
  File "env/lib/python3.10/site-packages/octodns_cloudflare/__init__.py", line 131, in _try_request
    return self._request(*args, **kwargs)
  File "env/lib/python3.10/site-packages/octodns_cloudflare/__init__.py", line 156, in _request
    raise CloudflareAuthenticationError(resp.json())
octodns_cloudflare.CloudflareAuthenticationError: Authentication error

We can work around this by just rerunning the sync until all the zones are successfully processed, but this is a bit of a nuisance.

ross commented 1 week ago

What version are you running. The line numbers in your stack track e.g. 920, don't align with the current release 0.0.7 as that line isn't even in the _apply_Create function.

Some sort of creation timing issue is a good guess, other possibility would be a rate limit of some sort. You might try throwing resp.content into https://github.com/octodns/octodns-cloudflare/blob/5489b15ea20d907399378417b3ba8f8c6bc986d8/octodns_cloudflare/__init__.py#L151 and running with --debug if you can reliably recreate the problem.

When I get a change to sit down and mess with it I'll try and reproduce the issue, but it sounds like it might be a tough one to do and may even rely on latency to CF's api servers etc.

rlaakkol commented 6 days ago

Yeah i was running 0.0.6. I'll try to run with extra debugging once we get the next bigger batch of zones to process through octodns! Thanks for the info!

rlaakkol commented 6 days ago

One extra tidbit of information: We intially worked around this issue by just running octodns-sync in a loop targeted at one zone at a time, and in this way the authentication error never happened. Also we had our request quotas increased from Cloudflare side to make sure we weren't hitting rate limits, but that had no effect.

ross commented 6 days ago

in a loop targeted at one zone at a time, and in this way the authentication error never happened.

That does make it sound more like a rate limit than race condition, but 🤷

Also we had our request quotas increased from Cloudflare side to make sure we weren't hitting rate limits, but that had no effect.

OK. Since it sounds like you have an account contact you might check with them and see if they have anything to say about eventually consistent creates/race conditions.