rootwyrm / dns_docker

Complete DNS suite for use in Docker
Other
2 stars 0 forks source link

Docker DNS hijacking breaks DNSSEC validation #21

Open rootwyrm opened 3 years ago

rootwyrm commented 3 years ago

Identified during bind debugging. This is caused by the bridge network hijacking DNS traffic and breaking all DNSSEC responses. Symptomatic container:

05-Nov-2020 14:14:18.826 dnssec: debug 3:   validating ./DNSKEY: no supported algorithm/digest (DS)
05-Nov-2020 14:14:18.826 dnssec: debug 3:   validating ./DNSKEY: marking as answer (validate_dnskey (3))
05-Nov-2020 14:14:18.826 dnssec: debug 4:   validator @0x560372c2be00: dns_validator_destroy
05-Nov-2020 14:14:18.826 dnssec: debug 3: validating ./NS: in validator_callback_dnskey
05-Nov-2020 14:14:18.826 dnssec: debug 3: validating ./NS: keyset with trust answer
05-Nov-2020 14:14:18.826 dnssec: debug 3: validating ./NS: resuming validate
05-Nov-2020 14:14:18.826 dnssec: info: validating ./NS: no valid signature found
05-Nov-2020 14:14:18.826 dnssec: debug 3: validating ./NS: falling back to insecurity proof
05-Nov-2020 14:14:18.826 dnssec: debug 3: validating ./NS: insecurity proof failed: success
05-Nov-2020 14:14:18.826 dnssec: debug 4: validator @0x560372c5f5e0: dns_validator_destroy

Same container using host is non-symptomatic; yes this means the cache bug observed in dnsdist isn't a cache bug in dnsdist and never was. After wasting several hours trying to figure out why DS was failing even with --disable-isc-spnego since it often causes oddness. And this is one aggressive as shit hijack; murdering /etc/resolv.conf did nothing to help and drill showed failures to fully external known-goods. Moby / Docker has a long and storied history more than 400 comments long of refusing to acknowledge that this is a defect, or to allow user-defined networks to not use their defective proxy, and has changed behavior to prevent bypassing of their broken code. This is without question one of the most idiotic and childish things I've ever seen from any organization, which also creates a massive security risk. Because it forces one to disable DNSSEC completely, making the container extremely vulnerable to poisoning, which is still frequently observed in the wild (accidental and deliberate.)

The only way to make DNSSEC work without poisoning is to use host networking. Which, yep, means you are completely breaking the network isolation and stripping away the security layer you were trying to put in in the first place. (Also breaks my IP model. Thanks guys.) Didn't dig in further, but it's definitely the issue I identified with dnsdist cache failures. Should have known I didn't screw up the packets. Don't see any value in opening an issue with Docker because this is part of known problems they have repeatedly said that they refuse to accept as defects or implement any fix since 2015. (The necessary fix is: bypass their broken fucking proxy bullshit. But nooooo, badly emulating /etc/hosts without any understanding of context or purpose in 2020 is much more important.)

Thankfully, my stricter bind configuration fully exposed this utter idiocy on their part before I took to release. Need to examine Unbound but I expect it is breaking silently because I didn't log aggressively enough (due to other bugs and known limitations in RPZ) or may be bypassing in another way. However, also haven't tested Unbound with this set of updates or with a less tolerant configuration. Unbound as currently shipping will accept DNSSEC failures. This also obviously has implications for TSIG AXFR/IXFR operations as the chain of trust is clearly broken. GSS-TSIG is still fine assuming external krb5; no evidence the proxy is altering with SRV records in any way, though something tells me someone out there is doing that wrong.

Frankly not sure if/how much I want to bother digging into this. It's going to be a nightmare because the ONLY way to properly inspect is to get the packets AT the bridge plane on BOTH SIDES of the bridge.

rootwyrm commented 3 years ago

This appears to be resolved in 20.10 (aside from systemd-resolved which needs to die in a fire along with anyone who defends it.) Testing is ongoing.