octodns / octodns-cloudflare

Cloudflare DNS provider for octoDNS
MIT License
24 stars 18 forks source link

Full support for auto-ttl independant of proxied #65

Closed ross closed 11 months ago

ross commented 11 months ago

Full support for auto-ttl independent of proxied. More details in the change to the README.

Clearer documentation around auto-ttl (and proxied) ignoring the configured TTL

This should fully support the permutations that Cloudflare allows. TL;DR is that enabling proxied and/or auto-ttl will ignore the record's in-built TTL. This is indicated to the Cloudflare API with TTL=1, but that implementation detail is not exposed to users of octoDNS.

When dumped (as was previously the case) records with proxied or auto-ttl enabled will have the flags set in the octodns.cloudflare section of the record data and the TTL will be set to 300, which is effectively what CF is doing under the hood.

I also went through and tested all the permutations of things and their ability to change back and forth between each other and there was a tweak or two to the existing stuff to handle all that better.

/cc https://github.com/octodns/octodns-cloudflare/issues/62 /cc @KRoss45 @istr

istr commented 11 months ago

@ross looks good. As I don't have a CF account (anymore) I can't test it against the real stuff, but @KRoss45 can, I guess.

KRoss45 commented 11 months ago

@ross looks good. As I don't have a CF account (anymore) I can't test it against the real stuff, but @KRoss45 can, I guess.

Indeed. I'm out of office until Tuesday though, but I look forward to it. Thanks guys!

KRoss45 commented 11 months ago

First pass on testing (no actual applying) with the new changes:

I then updated the rest of the zone config and updated the ttl values to be not 1, just picking for the sake of having a value. It appears that other records are not following the auto setting. I'm seeing CNAME TXT and CAA updates; eg:

We have a pair of cnames that were previously set to ttl 1, were then updated to the following config:

  octodns:
    cloudflare:
      auto-ttl: true
      proxied: false
  ttl: 180
  type: CNAME

The dry run wants to update them to the ttl value, not the auto value of 1.

* cloudflare (CloudflareProvider)
*   Update
*     <CnameRecord CNAME 1, 
*     <CnameRecord CNAME 180, 

There are also TXT records not picking it up. Original:

_digme
  ttl: 1
  type: TXT

Now:

_digme
  octodns:
    cloudflare:
      auto-ttl: true
  ttl: 123
  type: TXT

The above TXT record wants to update to 123.

*   Update
*     <TxtRecord TXT 1, _digme.lavexproducts.com. XXX->
*     <TxtRecord TXT 123, 
ross commented 11 months ago

I can't reproduce either of those. I tried creating the records in the UI and I tried creating them with octodns-cloudflare main. Both ways I don't see diffs once I'm on the branch's code with the config's below:

We have a pair of cnames that were previously set to ttl 1, were then updated to the following config:

I can't reproduce this:

Screenshot 2023-10-31 at 4 24 09 PM
test:
  octodns:
    cloudflare:
      auto-ttl: true
      proxied: false
  ttl: 180
  type: CNAME
  value: xormedia.com.
(env) coho:octodns-cloudflare ross$ PYTHONPATH=. octodns-sync --config-file=config/dev.yaml xormedia.com.
...
********************************************************************************
No changes were planned
********************************************************************************

There are also TXT records not picking it up.

Or this one

Screenshot 2023-10-31 at 4 27 30 PM
_digme:
  octodns:
    cloudflare:
      auto-ttl: true
  ttl: 123
  type: TXT
  value: Hello World
(env) coho:octodns-cloudflare ross$ PYTHONPATH=. octodns-sync --config-file=config/dev.yaml xormedia.com.
...
********************************************************************************
No changes were planned
********************************************************************************
ross commented 11 months ago

Just pushed a commit that adds some --debug logging to print out the value of proxied and auto-ttl for records as things are loaded:

2023-10-31T16:33:46  [140704620871296] DEBUG Record __init__: zone.name=xormedia.com., type=  TxtRecord, name=_digme
2023-10-31T16:33:46  [140704620871296] DEBUG CloudflareProvider[cloudflare] _record_for: auto_ttl=True

Might help with debugging this.

KRoss45 commented 11 months ago

I needed to start over with another domain that I don't have to worry about posting IP info and also this one is inactive so I was given a green light to hose it all up if need be to experiment.

I saw a mixed result with this one in the test 'legacy config' pass where we set ttl: 1 and the code seeing a 300 come back (despite it set to auto in the CF UI) and the config saying 1. That got me concerned I didn't have the updated code but it looks like all the auto additions are there:

grep -i auto /usr/local/lib/python3.11/site-packages/octodns_cloudflare/__init__.py
        auto_ttl = records[0]['ttl'] == 1
                '_record_for: proxied=%s, auto_ttl=%s', proxied, auto_ttl
                'auto-ttl': auto_ttl,
            self.log.debug('_record_for: auto_ttl=%s', auto_ttl)
            # auto-ttl can still be set, signaled by a ttl=1, even if
            record._octodns['cloudflare'] = {'auto-ttl': auto_ttl}
            # true of non-proxied records when auto-ttl is enabled
            elif self._record_is_just_auto_ttl(change.existing):
    def _record_is_just_auto_ttl(self, record):
        'This tests if it is strictly auto-ttl and not proxied'
            and record._octodns.get('cloudflare', {}).get('auto-ttl', False)
        if proxied or self._record_is_just_auto_ttl(record):
            # proxied implies auto-ttl, and auto-ttl can be enabled on its own,
                self._record_is_just_auto_ttl(existing_record)
                != self._record_is_just_auto_ttl(desired_record)

After trying an apply, it continued wanting to update 300 > 1, so I added in the auto-ttl flag and that calmed most of it down and set things correctly.

One I'm still struggling to get to straighten out is a TXT record.

  - octodns:
      cloudflare:
        auto-ttl: true
    ttl: 111
    type: TXT
    value: v=spf1 include:_spf.herculesbag.com -all

CF UI shows it as Auto.

Dry run wants to update it

* cloudflare (CloudflareProvider)
*   Update
*     <TxtRecord TXT 300, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']> ->
*     <TxtRecord TXT 111, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']> (config)
*   Summary: Creates=0, Updates=1, Deletes=0, Existing Records=8

Update catches it but looks like it's pushing the 1 despite the 'to be updated' looking to the ttl value

2023-11-01T20:44:36  [140637136612224] DEBUG Zone changes: zone=Zone<herculesbag.com.>, modified
    existing=<TxtRecord TXT 300, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']>,
     desired=<TxtRecord TXT 111, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']>   

* cloudflare (CloudflareProvider)
*   Update
*     <TxtRecord TXT 300, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']> ->
*     <TxtRecord TXT 111, herculesbag.com., ['v=spf1 include:_spf.herculesbag.com -all']> (config)
*   Summary: Creates=0, Updates=1, Deletes=0, Existing Records=8

2023-11-01T20:44:36  [140637136612224] DEBUG CloudflareProvider[cloudflare] _apply_Update: updating a00b64494ded32342edf73b9f0381f33, {'content': 'v=spf1 include:_spf.herculesbag.com -all', 'name': 'herculesbag.com', 'type': 'TXT', 'ttl': 1} -> {'content': 'v=spf1 include:_spf.herculesbag.com -all', 'name': 'herculesbag.com', 'type': 'TXT', 'ttl': 1}
ross commented 11 months ago

I saw a mixed result with this one in the test 'legacy config' pass where we set ttl: 1 and the code seeing a 300 come back (despite it set to auto in the CF UI) and the config saying 1. That got me concerned I didn't have the updated code but it looks like all the auto additions are there:

At this point I haven't touched the 1 <-> 300 stuff so it'll behave as it had in the past. Now that auto-ttl is a thing I would highly recommend changing the 1's to 300s in the YAML as CF will be ignoring them. You wouldn't get the 1's now if you were to do a dump today.

The reasons behind the 1 -> 300 switch still stand. If someone is dumping from CF and going to Route53 or whatever it will get matching behavior. New dumps will have the auto-ttl in there so it'll behave better/correct now in terms of CF.

Also CF doesn't support TTL < 60s/120s anyway, so if auto-ttl wasn't there it'd essentially ignore the 1 anyway.

One I'm still struggling to get to straighten out is a TXT record.

I'll try and dig into this this afternoon and see if I can reproduce something like that. I wasn't able to previously, but I can poke around more.

ross commented 11 months ago

Ah, I think I figured out the dissonance here. You were expecting the configured TTL to be ignored when auto-ttl is enabled?

I was expecting it to still be checked/match so that other providers would behave as close to CF as possible.

Should be easish to resolve. Will push something up later this afternoon

KRoss45 commented 11 months ago

Ah, I think I figured out the dissonance here. You were expecting the configured TTL to be ignored when auto-ttl is enabled?

I was expecting it to still be checked/match so that other providers would behave as close to CF as possible.

Should be easish to resolve. Will push something up later this afternoon

Aaah yes. The assumption was with the CF auto switch that the provider would ignore the TTL and any other provider would skip it and use the provided. That would allow for CF to auto and say GCloud to set a TTL.

ross commented 11 months ago

CF's major outage has kind of put a damper on making and testing the change I had in mind :grin:, guess not sure if I'll get a chance to dig into it tomorrow, but will push it whenever I do.

ross commented 11 months ago

So I thought I had set things up to ignore the TTL when auto-ttl is enabled and I did. What's actually happening with the 111 is that it's hitting CloudflareProvider's min ttl handling,

https://github.com/octodns/octodns-cloudflare/blob/c9db2cce331d188f7bd3bad6292abe37ec075e1e/octodns_cloudflare/__init__.py#L533

That's taking anything < 120 and making it 120. Fix would initially seem to be to do the same clamping on existing, but I believe that turns up other issues. Still digging.

KRoss45 commented 11 months ago

That's taking anything < 120 and making it 120. Fix would initially seem to be to do the same clamping on existing, but I believe that turns up other issues. Still digging.

That completely explains why I was getting records with a 2min TTL instead of the auto. I'm so glad you noticed that, it was driving me nuts.

I tried it with 133 (all these values are garbage/made up) aaaaand still down so, until tomorrow.

ross commented 11 months ago

Interestingly the 111 bug is actually there in main for proxied records and doesn't have anything to do with this branch.

ross commented 11 months ago

So jus pushed up a set of fixes that should result in ignoring TTL when proxied or auto-ttl are enabled. It also fixes the handling of change inclusion/exclusion based on MIN_TTL.

KRoss45 commented 11 months ago

This is starting to look good on our end. I'm going to test out a few more zones to see if I find any other anomalies.

ross commented 11 months ago

Decided this needed more testing and glad I did, found a shortcomming in the implementation https://github.com/octodns/octodns-cloudflare/pull/65/commits/faeba7a0b4971c6ab596754a6c86ef65444f107d.

Also found a bug/shortcoming in Record.copy and Record.data. Will be opening up an issue on those shortly.

KRoss45 commented 11 months ago

Ross one thing that's changed is the addition of the URLFWD code. That's essentially now doing work on records CF controls in a different 'module' they call "Rules" vs. managing it all in the DNS section.

Was that intentional? We've got a handful of ad-hoc rules that are getting marked for delete, I'm confirming as to whether or not I'll need to back-fill them.

ross commented 11 months ago

Was that intentional? We've got a handful of ad-hoc rules that are getting marked for delete, I'm confirming as to whether or not I'll need to back-fill them.

No, wasn't intentional. I don't have any of those set up. Can you provide any further details/examples?

KRoss45 commented 11 months ago

So in this case these are "page rules" where CF will perform a 301 perm redirect to the www. of the zone when someone comes in at the top.

Here's what a sync dry run says would happen:

* cloudflare (CloudflareProvider)
*   Delete <UrlfwdRecord URLFWD 300, herculesbag.com., ['"/*" "https://www.herculesbag.com/$1" 301 2 0']>
*   Summary: Creates=0, Updates=0, Deletes=1, Existing Records=9

If the rule is up there but not switched on, Octo apparently doesn't see it and doesn't want to delete. With it switched on, then Octo sees it and wants to remove it.

ross commented 11 months ago

Ross one thing that's changed is the addition of the URLFWD code. That's essentially now doing work on records CF controls in a different 'module' they call "Rules" vs. managing it all in the DNS section.

So the URLFWD bit in the diffs should just be a translation of the old stand-alone check here https://github.com/octodns/octodns-cloudflare/pull/65/files#diff-d6838eda1567eb5fb2b061691b90a5092cc1cf8954393cb240eb16a946206db1L528-L530. Only difference I can see is that before it was only checked when proxied was false, but the code in the branches was identical so it should have applied the same logic either way.

I don't immediately see what has changed so I'll have to sit down and debug a bit more, I guess I'll need to create some URLFWD's in my account.

ross commented 11 months ago

I'm seeing the same behavior w/main as I get on auto-ttl w/respect to deleting the URLFWD.

I manually set a rule up similar to the one you posted above:

Screenshot 2023-11-14 at 7 39 43 AM

with main

$ PYTHONPATH=. octodns-sync --config-file=config/dev.yaml xormedia.com.
...
2023-11-14T07:39:27  [140704302657152] INFO  Plan
********************************************************************************
* xormedia.com.
********************************************************************************
* cloudflare (CloudflareProvider)
*   Delete <UrlfwdRecord URLFWD 300, url.xormedia.com., ['"/*" "https://google.com/$1" 302 2 0']>
*   Summary: Creates=0, Updates=0, Deletes=1, Existing Records=11
********************************************************************************

with auto-ttl

$ PYTHONPATH=. octodns-sync --config-file=config/dev.yaml xormedia.com.
2023-11-14T07:41:43  [140704302657152] INFO  Plan
********************************************************************************
* xormedia.com.
********************************************************************************
* cloudflare (CloudflareProvider)
*   Delete <UrlfwdRecord URLFWD 300, url.xormedia.com., ['"/*" "https://google.com/$1" 302 2 0']>
*   Summary: Creates=0, Updates=0, Deletes=1, Existing Records=11
********************************************************************************

So I don't think this is related to the change here, though I could be wrong.

It is expected that octoDNS will manage the Page Rules as they're equivilent to the URLFWD record type that is supported by a handful of providers.

If those rules need to live on then I assume you'll need to configure them in your YAML, the above rule looks like:

url:
  type: URLFWD
  value:
    code: 302
    masking: 2
    path: '/*'
    query: 0
    target: 'https://google.com/$1'

I just verified that the page rules are correctly dumped as URLFWDs with octodns-dump.

ross commented 11 months ago

If those rules need to live on then I assume you'll need to configure them in your YAML, the above rule looks like:

Or you could use a TypeRejectlistFilter processor to ignore URLFWDs

KRoss45 commented 11 months ago

Yeah I think I just missed this with all the other TTL noise. I went back to origin and see it's still there so it's just a 'new' thing to us with our upgrading.

ross commented 11 months ago

I think we're good to go here. Will leave it a few more days for any final testing/comments and then :shipit:

ross commented 11 months ago

Did some more testing and realized that dumps from CF got really noisy with all the auto-ttl/proxied=False stuff included on every record. That's not intended or desired. Pushed a change that only sets those keys if the values are true (missing is the same as false.)

In looking over test casees realized there wasn't anything testing _record_for with aut-ttl=True and proxied=False. Added that.