stackabletech / zookeeper-operator

A tool that can be used to deploy and manager Apache ZooKeeper clusters/ensembles
Other
26 stars 8 forks source link

Broken TLS Quorum hostname verification #760

Closed nightkr closed 4 months ago

nightkr commented 9 months ago

Currently, enabling Quorum TLS will make the server validate SANs client certificates of connecting quorum peers against their reverse DNS address. This is less-than-helpful for two reasons:

  1. ZK pods' IP addresses will resolve to a hostname per service it participates in, only one of which is in the certificate SAN (zk-server-default-0.zk-server-default.default.svc.cluster.local is in SAN, 1-2-3-4.zk.default.svc.cluster.local is not in SAN).
  2. "This certificate matches the connecting peer" does not mean "this peer should be allowed to connect".

Instead, the ZK server should verify the SAN against the list of servers (servers.N in the config). A peer should be able to connect on the quorum port if and only if at least one SAN matches at least one of the listed servers.

Additionally, it would be nice to have a "disable client hostname verification" option that still leaves server hostname verification enabled.

Both of these would need to be implemented upstream.

lfrancke commented 9 months ago

See also: https://issues.apache.org/jira/browse/ZOOKEEPER-4790

fhennig commented 5 months ago

Is there more to be refined? Are we going to do these upstream changes or is it enough to have reported the problem?

nightkr commented 5 months ago

IMO it's fine to end up shipping a patch for now, but we should try to get it upstreamed.

nightkr commented 5 months ago

The question here is.. do we just want to disable the mostly-useless old behaviour (verifying a strong identity (X509) against a weak one (IP/DNS)), or do we want to do the "correct" validation here (verifying that the strong identity (X509) matches who we expect to connect)?

nightkr commented 5 months ago

Having looked through the codebase a bit more, authz seems like a somewhat bigger change.

For now, I'd prioritize disabling the DNS-based client hostname verification, and then break out authz into a separate issue.

nightkr commented 5 months ago

Created #820 for the latter.

nightkr commented 4 months ago

Turns out, enabling "FIPS mode" on ZK 3.8.2+ already "solves" this, and it is on by default on 3.9.0+.

I've submitted a PR for a more specific toggle as https://github.com/apache/zookeeper/pull/2173. Moving this to track as we're now waiting for upstream.

nightkr commented 4 months ago

Closing since FIPS mode is available in all our supported versions, and on by default in all non-deprecated versions.

Moved the followup work into #829 instead.