octodns / octodns-ns1

Ns1Provider provider for octoDNS
MIT License
4 stars 13 forks source link

Use native HTTP monitors for HTTP(S) healthchecks #49

Closed viranch closed 1 year ago

viranch commented 1 year ago

Ran into a show stopper problem with TCP monitors: NS1 errors out if the health-check path contains &, usually shows up when using query params. Root cause unknown, but HTTP monitors don't seem to have that problem.

Plus using HTTP monitors is the more appropriate way of implementing HTTP(s) healthchecks anyway.

This change will cause sync to trigger a (graceful) update for all dynamic records that use HTTP healthchecks.

ross commented 1 year ago

/cc https://github.com/octodns/octodns/issues/713 for related discussion. Namely that NS1 pushed me towards using TCP this way when we run into problems during the initial development trying to use HTTP and unfortunately I've lost access to the work email that had the history/details as to why.

Vaguely remember it having something to do with TLS validation, maybe HTTP didn't allow things to ignore self signed certs or something like that. It was 5-6 years ago and I really don't remember though :-(

viranch commented 1 year ago

Without knowing the original matter, NS1's current recommendation is to use HTTP monitors.

TLS validation is toggle-able in both types of monitors today, perhaps whatever it is no more a standing issue.

I think we should be forward looking and use the appropriate monitor types for long term stability rather than patching TCP monitors to do HTTP, especially if the historical reason is no longer known and given there's a concrete issue at hand with TCP monitors.

viranch commented 1 year ago

something to do with TLS validation

Perhaps it was to do with SNI? HTTP monitors will use the value of host header as SNI for SSL handshake. TCP monitors don't use any SNI. But I believe that also shouldn't matter if TLS verify is disabled.

ross commented 1 year ago

I think we should be forward looking and use the appropriate monitor types for long term stability rather than patching TCP monitors to do HTTP, especially if the historical reason is no longer known and given there's a concrete issue at hand with TCP monitors.

Generally don't disagree, but it's a fairly dangerous area to mess with in terms of a deployment going out without any meaningful changes turning into an all health checks start failing and everything is going to the non-heath checked defaults.

I do remember the change happening very early on, when I was probably only dealing with github so whatever it was was something that affected github. Unfortunately I don't have the visibility to test it there as thoroughly as I used to.

Perhaps it was to do with SNI? HTTP monitors will use the value of host header as SNI for SSL handshake. TCP monitors don't use any SNI. But I believe that also shouldn't matter if TLS verify is disabled.

That's possible. If that was it things didn't use the host header for SNI at the time though.

viranch commented 1 year ago

Do you think a changelog entry plus a log.warn would be enough notice to users?

viranch commented 1 year ago

I realized that NS1 doesn't allow updating monitor types, so existing TCP monitors for HTTP health-checking will fail to get updated to HTTP type. I've added a commit for doing delete+create for this, but I think its now a breaking-enough change (answers go into a limbo until the new monitor status is populated) that should go behind a feature flag. That way users can opt out of this entirely if they want to stick to TCP monitors. How does that sound?

Btw I also realized this won't be a forward-compatible change, once users upgrade and run sync, they can't downgrade.

ross commented 1 year ago

Yeah opting in makes it a non-sensitive change. Probably wouldn't hurt to have a warning if this is the way you think things should go for new users. Down the road we could consider making the warn and error to force people to make the call between the old and new and explicitly set a value to get the existing TCP behavior.

Will try and sit down and look at the code in detail tomorrow.