Allow custom peer certificate verification for services

jbdoumenjou commented 3 years ago

Do you want to request a feature or report a bug?

Feature

What did you expect to see?

Currently, the serversTransport configuration is based on Go default peer certificate validation.

The #7407 points out the need to define a custom validation for Spiffe certificates that will use URIs instead of DNSNames.

The goal is to be able to apply custom validation mechanisms such as URIs check or whatever build on available certificate data.

Actual Configuration

  serversTransports:
    ServersTransport0:
      serverName: foobar
      insecureSkipVerify: true
      rootCAs:
      - foobar
      certificates:
      - certFile: foobar
        keyFile: foobar

Proposed Configurations

It is the first step to engage the discussion, the naming and the final configuration might evolve.

serversTransports:
  # Proposition 1
  ServersTransport1:
    # use as SNI if defined
    serverName: foobar
    # if true, will skip all certificate verification
    insecureSkipVerify: false
    # custom certificate verification configuration
    certificateVerification:
      # will apply the default go verification, and use serverName value if specified
      verifyHostname: true
      # define explicitly the dns names to check
      DNSNames:
        - foobar
      # check the certificate URIs
      URIs:
        - spiffe://foo.com
    rootCAs:
      - foobar
    certificates:
      - certFile: foobar
        keyFile: foobar

  # Proposition 2
  ServersTransport2:
    serverName: foobar
    insecureSkipVerify: false
    certificateVerificationRule: IsValidHostname() && (URIs(`spiffe://foo.com`) || DNSNames(`foobar`))
    rootCAs:
      - foobar
    certificates:
      - certFile: foobar
        keyFile: foobar

The first proposition is simpler, it will be a AND between types of verification (DNSNames, URIs) and OR inside. The second proposition is more complex but allows us to fine-tune the verification.

If the insecureSkipVerify is true, no validation mechanism will be applied.

serverName	verifyHostname	URIs	Behavior
not defined	false	not defined	no validation
not defined	false	defined	URIs validation
not defined	true	not defined	"common" validation
not defined	true	defined	"common" validation && URIs validation
defined	false	not defined	no validation
defined	false	defined	URIs validation
defined	true	not defined	"common" validation with the serverName attribute
defined	true	defined	"common" validation with the serverName attribute && URIs validation

apollo13 commented 3 years ago

I like the simplicity of the first proposal. I do not like the idea of shooting yourself in the foot in security sensitive areas via custom rules. If at one point the first proposal should actually not be enough, adding a rule would still be possible.

That said, this approach would not work for consul connect as it currently stands because I am not able to construct the SPIFFE-URL before actually getting the server certificate. As it stands now, the path component of the URL is constructed (this one is known before) and then the hostname of the received URL is interpolated and matched against. This is also how consul does this upstream: https://github.com/hashicorp/consul/blob/67523a1365b3ccbc6e80f561353531630431f91c/connect/tls.go#L189-L197

So currently I do not see a possibility without directly exposing the callback. Then again my knowledge of SPIFFE is limited to copying the upstream code…

kevinpollet commented 3 years ago

Hello @apollo13,

If we correctly understand the verifyPeerCertificate implementation you are using the server certificate to extract the TrustDomain which is used to construct the spiffe URL to validate. Extracting the TrustDomain might not be the right approach because this means that the TrustDomain it is not validated.

Maybe the TrustDomain could be retrieved from the RootCA or from the Consul Connect API?

apollo13 commented 3 years ago

@kevinpollet Probably, I just did what upstream consul did. The have a comment in their source code with the following reasoning:

    // Override the hostname since we rely on x509 constraints to limit ability to
    // spoof the trust domain if needed (i.e. because a root is shared with other
    // PKI or Consul clusters). This allows for seamless migrations between trust
    // domains.

Lets loop in @blake from the Consul team, they might have a few ideas on what is possible and if consul exposes the trust domain at all.

traefik / traefik