Closed istr closed 1 year ago
I can reproduce, but even stranger I'm seeing persistent changes on records that didn't have that problem the last time I used them. Still digging into it.
After some initial digging this seems to be related to the monitoring part of dyanmic records. If I push things with status: up
it'll get to a settled state and show no changes. When the records include health checks they seem to be persistently showing changes. My guess is that NS1 has changed some details about the health checks/api, but I haven't gotten far enough into things yet to know if that's true.
Yeah, the update is coming from _extra_changes
2023-03-28T07:55:32 [4644720128] INFO Ns1Provider[ns1] _extra_changes: monitor mis-match for a.exxampled.com - A - 1.1.1.1
Which is triggered when _monitor_is_match
returns False
:
Printing out the two
{'expected': {'active': True,
'config': {'connect_timeout': 2000,
'host': '1.1.1.1',
'port': 443,
'response_timeout': 10000,
'send': 'GET /_dns HTTP/1.0\\r\\nHost: '
'a.exxampled.com\\r\\nUser-agent: '
'NS1\\r\\n\\r\\n',
'ssl': True},
'frequency': 60,
'job_type': 'tcp',
'name': 'a.exxampled.com - A - 1.1.1.1',
'notes': 'host:a.exxampled.com type:A',
'policy': 'quorum',
'rapid_recheck': False,
'region_scope': 'fixed',
'regions': ['lga'],
'rules': [{'comparison': 'contains',
'key': 'output',
'value': '200 OK'}]},
'have': {'active': True,
'config': {'connect_timeout': 2000,
'host': '1.1.1.1',
'ipv6': False,
'port': 443,
'response_timeout': 10000,
'send': 'GET /_dns HTTP/1.0\\r\\nHost: '
'a.exxampled.com\\r\\nUser-agent: NS1\\r\\n\\r\\n',
'ssl': True,
'tls_add_verify': False},
'frequency': 60,
'id': '6422ff1c65f383007cace839',
'job_type': 'tcp',
'mute': False,
'name': 'a.exxampled.com - A - 1.1.1.1',
'notes': 'host:a.exxampled.com type:A',
'notify_delay': 0,
'notify_failback': True,
'notify_list': '62f29339f6491b0095148771',
'notify_regional': False,
'notify_repeat': 0,
'policy': 'quorum',
'rapid_recheck': False,
'region_scope': 'fixed',
'regions': ['lga'],
'rules': [{'comparison': 'contains',
'key': 'output',
'value': '200 OK'}],
'status': {'global': {'fail_set': ['lga'],
'since': 1680015196,
'status': 'down'},
'lga': {'fail_set': ['Failure for Rule: output contains '
'200 OK'],
'since': 1680015196,
'status': 'down'}}}}
{'have.get': {'connect_timeout': 2000,
'host': '1.1.1.1',
'ipv6': False,
'port': 443,
'response_timeout': 10000,
'send': 'GET /_dns HTTP/1.0\\r\\nHost: '
'a.exxampled.com\\r\\nUser-agent: NS1\\r\\n\\r\\n',
'ssl': True,
'tls_add_verify': False},
'k': 'config',
'mismatch': True,
'v': {'connect_timeout': 2000,
'host': '1.1.1.1',
'port': 443,
'response_timeout': 10000,
'send': 'GET /_dns HTTP/1.0\\r\\nHost: a.exxampled.com\\r\\nUser-agent: '
'NS1\\r\\n\\r\\n',
'ssl': True}}
Looks like they've added a new keys to the config
section: tls_add_verify
and ipv6
and the current method of comparison sees that as a diff.
Working on a PR/solution now.
The order of the pools and rules does not affect the detection of changes.
(not directly related to the issue, but to clarify)
The order of rules
does matter and should be preserved. They are intended to be applied in the order listed. That allows targeting sub-locations within a larger geo, e.g. rule 0 sends EU-ES to a pool in Spain and then rule 1 sends the rest of EU to a pool in Germany.
I originally ran through options to try and automagically sort the rules, but quickly realized that there could be conflicts that couldn't be resolved, e.g.
- geos: [NA-US, EU]
pool: eu
- goes: [EU-ES, NA]
pool: na
That config doesn't make sense and EU-ES
would not be sent to na as things currently work, but if automagically sorted which one would come first and thus which geo wouldn't get targeted?
The new validations in https://github.com/octodns/octodns/issues/989 which warn users about this sort of problem, but I still think explicit ordering is preferable.
pools are a dictionary so their order doesn't matter, octoDNS's preferred order is sorted keys.
I've reproduced the problem, even when running on the branch in https://github.com/octodns/octodns-ns1/pull/39, but it seems to happen regardless of whether or not I set enforce_order
.
I'm seeing a diff both in the pool order and rule order. They're actually coming from 2 different things.
This one is an issue, but I believe should be solved in octoDNS core. It should persistently order the pools internally so that variation in their order don't matter, the way it already deals with differences in case, idna, etc.
This is happening b/c the catch-all rule isn't the last rule in the chain. https://github.com/octodns/octodns/issues/989 will warn about this issue in the near future so there's not anything to fix.
The documentation added in https://github.com/octodns/octodns/pull/991 talks about best practices/requirements for dynamic rules. It also talks about "unexpected" behaviors from providers if lenient
is enabled for such records and this would be a case of that.
I need to do a bit more digging and testing, but at this point I believe there will be a PR to octodns/octodns to fix the pool ordering issue. I don't believe there will be any changes around the rule ordering.
I don't believe there will be any changes around the rule ordering.
Well that was not correct.
Turns out there's a completely unrelated issue with Ns1Provider
rule ordering. We stuff a rule order into the notes field and pull it back out to sort the rules so they're in the expected order:
There's a super subtle bug in that code... The meta.note field where the order data is stored is a string. The parsing code leaves it as-is so rule_order
is a string. When there are > 10 rules (0 indexed) the 11th one has the order "10"
which doesn't sort numerically so we get 0, 1, 10, 2, 3, ...
Trivial fix/PR to that incoming.
I need to do a bit more digging and testing, but at this point I believe there will be a PR to octodns/octodns to fix the pool ordering issue
And after fixing the rule ordering int
bug it and looking in octoDNS core it turns out the fix I thought might be need there is already in place https://github.com/octodns/octodns/blob/1f8d7ade33c1d84447e134629bace1ce9a0d4e37/octodns/record/dynamic.py#L296-L297.
So I believe this is actually just a bug in the rule-order
handling.
So I believe this is actually just a bug in the rule-order handling.
I can confirm this. #35 is fixed with #42.
I originally ran through options to try and automagically sort the rules, but quickly realized that there could be conflicts that couldn't be resolved, e.g.
- geos: [NA-US, EU] pool: eu - geos: [EU-ES, NA] pool: na
That config doesn't make sense and
EU-ES
would not be sent to na as things currently work, but if automagically sorted which one would come first and thus which geo wouldn't get targeted?
I think that this problem could be solved using one of the following two rulesets (probably kind of what you implemented in https://github.com/octodns/octodns/pull/991)
ruleset A)
geos: [NA-US, EU]
would be invalidresult: your example would be rejected
ruleset B)
geos: [NA-US, EU] / pool: eu
would be split into geos: [NA-US] / pool: eu
and geos: [EU] / pool: eu
result: your example would be resolved to:
- geos: [NA-US]
pool: eu
- geos: [EU]
pool: eu
- geos: [EU-ES]
pool: na
- geos: [NA]
pool: na
(first step)
- geos: [NA-US]
pool: eu
- geos: [EU-ES]
pool: na
- geos: [EU]
pool: eu
- geos: [NA]
pool: na
(third step)
This would be unambiguous and probably what was intended behavior.
Note that it would even make no difference if the sorting would be unstable in this scenario.
This logic could also be moved back into octodns
core, so the providers need only implement how to configure the pools
and the rules
. The semantics of pool / rule handling would then become completely consistent across providers.
ruleset A) ... ruleset B) ...
Unfortunately pools cannot be reused in rules (except when the 2nd use is as the catch-all.) There's an existing validation in place to ensure this that has been there for a long time, if not as long as dynamic was a thing.
This comes about b/c several providers implementations don't allow re-using pools. In some cases it might be possible to work around that by creating fake pools behind the scenes, but without paging all the details of all the dynamic providers back in to memory I don't know if that would be possible for all (probably not, guessing that's why the rule came into place.)
This comes about b/c several providers implementations don't allow re-using pools.
Ok, I see. Then the idea (B) is not feasible. :slightly_smiling_face:
But I still think that rejecting "mixed" configurations (that are problematic) are still worth a warning or a rejection (A) in some kind of strict mode.
Observed behavior: The
ns1
provider does not return the pools and rules in strict order. As a consequence, with theenforce_order
flag set, every run creates a change, even though nothing changed.Expected behavior: The order of the pools and rules does not affect the detection of changes.
I am using octodns and the ns1 provider with the following versions:
The
config
provider has enforced order set:The following entry will produce a change with every run:
This leads reproducibly to the following diff: