hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.37k stars 4.42k forks source link

Does consul supports chained SAN certificates? #11357

Open vishwanathjadhav opened 3 years ago

vishwanathjadhav commented 3 years ago

Does the consul supports chained SAN certificates? We are repeatedly getting the following errors

consul[xxx]: consul.rpc: failed to read byte: tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage: leaf contains the following, recognized EKUs: 1.3.6.1.5.5.7.3.1 from=xxxx:xxx
consul[xxxx]: consul: error getting server health from "xxxxxhost-namexxx": rpc error getting client: failed to get conn: remote error: tls: bad certificate

The log:

consul[xxx]: consul.rpc: failed to read byte: tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage: leaf contains the following, recognized EKUs: 1.3.6.1.5.5.7.3.1 from=xxxxx:xxx

Is not occurring anymore but "tls: bad certificate" still persists.

vishwanathjadhav commented 3 years ago

In Consul's server certificates, Is it necessary to add the clientAuth in extendedKeyUsage ? are we making the certificate more generic? Is there a way to skip or remove this necessity?

Along with this can we make consul watches run on the hostname/FQDN/DNS(hostname+domain)? Because we can not add the management IPs in the SAN server certificates. As a result, the API calls performed on the management IP are failing due to unauthorized access.

Amier3 commented 2 years ago

@vishwanathjadhav

Apologies for the delayed response; I also see this is your first post so welcome to the hashicorp community!

To answer your first question:

In Consul's server certificates, Is it necessary to add the clientAuth in extendedKeyUsage ? are we making the certificate more generic? Is there a way to skip or remove this necessity?

To the best of my knowledge this is a necessary step. I'll link a doc we have here that explains our encryption process , but here's the most relevant note in the docs that answers your question:

Certificates need to be created with x509v3 extendedKeyUsage attributes for both clientAuth and serverAuth since Consul uses a single cert/key pair for both server and client communications.

For the second part of your question:

Along with this can we make consul watches run on the hostname/FQDN/DNS(hostname+domain)?

Yes, you definitely can. One of our guides linked here walks-through using SANs for client and server certificates, so I think it'd be worth checking that out.

It also might be helpful to checkout our guide on Secure Consul Agent Communication with TLS Encryption , as it might contain some information that can solve your tls:bad certificate issue.

Let me know if any of this information helps and if you have anymore questions!

vishwanathjadhav commented 2 years ago

Hi @Amier3 - Thanks buddy, it's a great help. There are still more scenarios that I would like to share which are creating a blocker for me.

We can not add the localhost, 127.0.0.1, and IP address of the management interface in the certificate also the CN is different than "" (i.e. we have to disable verify_server_hostname), It is as per the company standards. So the custom server certificate configuration and consul configurations become as below (we can only add DNS/Hostname/FQDNs)

#] cat server.SAN.csrf.conf
[req]
[----some more configurations here----]
[req_distinguished_name]
[----some more configurations here----]
CN = LabTestConsulSetup
[v3_req]
[----some more configurations here-----]
extendedKeyUsage = clientAuth, serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = testdatacenter.quxdataserver.com
DNS.2 = testdatacenter
# Can not add this localhost, IP address name and 127.0.0.1 in server certificates:
# DNS.3 = localhost
# IP.1 = 127.0.01
# IP. 2 = 192.168.1.2 (Management Interface IP)

To make the consul compatible with the above server certificates, I have modified consul configurations i.e. running consul on management IP address. I have also tried hostname but got the following error when added

"addresses": {
    "http":  testdatacenter.quxdataserver.com
}

~] /usr/local/consul validate /opt/consul/configure/config.json
Config validation failed: 1 error(s) occurred:
* addresses.http: invalid ip address: testdatacenter.quxdataserver.com

AND

"addresses": {
    "http":  testdatacenter
}

~] /usr/local/consul validate /opt/consul/configure/config.json
Config validation failed: 1 error(s) occurred:
* addresses.http: invalid ip address: testdatacenter

Currently, the consul is running on management interface IP with the following configurations

~] cat /opt/consul/configure/config.json
{
    "disable_anonymous_signature": true,
    "datacenter": "testdatacenter",
    "domain": "quxdataserver",
    "data_dir": "/opt/consul/data/",
    "addresses": {
        "http": "{{ GetInterfaceIP \"ethMNG\" }}",
        "dns": "{{ GetInterfaceIP \"ethMNG\" }}",
        "https": "{{ GetInterfaceIP \"ethMNG\" }}"
    },
    "cert_file": "/opt/consul/ssl/cert.pem",
    "client_addr": "{{ GetInterfaceIP \"ethMNG\" }}",
    "enable_syslog": true,
    "watches": [
        {
            "prefix": "cluster",
            "args": [
                "python",
                "-m",
                "ltb.cluster_consul_integration.bcb_consul_handler"
            ],
            "type": "keyprefix",
            "handler_type": "script"
        }
    ],
    "server": true,
    "bind_addr": "{{ GetInterfaceIP \"ethMNG\" }}",
    "ui": false,
    "ca_file": "/opt/consul/ssl/ca.pem",
    "disable_update_check": true,
    "pid_file": "/var/run/consul.d/consul.pid",
    "log_level": "warn",
    "key_file": "/opt/consul/ssl/cert.key",
    "verify_incoming": true,
    "ports": {
        "http": -1,
        "https": 9000
    },
    "verify_outgoing": true
}

With the above configurations, we were able to remove the localhost and 127.0.0.1 dependency but the IP address(192.168.1.2) dependency can not be removed it gives the following error which results in cluster failure.

Nov 24 11:51:56 xxxxxxxx.com consul[62162]: consul.watch: Watch (type: keyprefix) errored: Get https://192.168.1.2:9000/v1/kv/cluster?recurse=: x509: cannot validate certificate for 192.168.1.2 because it doesn't contain any IP SANs
, retry in 5s

Please help me with the same.

Amier3 commented 2 years ago

@vishwanathjadhav

Talked this over with some of the other team members and it looks like you encountered a bug with your watches configuration. I'd suggest checking out the issue linked here for a full explanation on the behavior that's causing errors.

Luckily, one of our engineers did find a workaround that should get your cluster up and running ( quoted from the linked issue ):

use a localhost address as the first client addresses (recommended) - this is a common setup, which is probably why we have not noticed this problem before. Each of these https://www.consul.io/docs/agent/options#addresses supports a space separated list, so use 127.0.0.1 .

In your specific configuration, this would be changing:

 "addresses": {
        "http": "{{ GetInterfaceIP \"ethMNG\" }}",
        "dns": "{{ GetInterfaceIP \"ethMNG\" }}",
        "https": "{{ GetInterfaceIP \"ethMNG\" }}"
    },

To:

 "addresses": {
        "http": "127.0.0.1 {{ GetInterfaceIP \"ethMNG\" }}",
        "dns": "127.0.0.1 {{ GetInterfaceIP \"ethMNG\" }}",
        "https": "127.0.0.1 {{ GetInterfaceIP \"ethMNG\" }}"
    },`
vishwanathjadhav commented 2 years ago

@Amier3 - With these configurations do I need to add 127.0.0.1 in the TLS certificates as SANs?

Amier3 commented 2 years ago

@vishwanathjadhav I'm not 100% sure, so I would try just changing the config first without adding anything to the TLS cert to see if that works.

dnephin commented 2 years ago

Ah yes, it will require you to add 127.0.0.1 to the IP SANs I believe. We do that by default here: https://github.com/hashicorp/consul/blob/v1.10.4/command/tls/cert/create/tls_cert_create.go#L116

I see you said that is not an option for you because of company policy, is that right?

11683 can track the work to support some way of setting the hostname

vishwanathjadhav commented 2 years ago

@Amier3 , @dnephin - Thanks guys for considering my queries.

I see you said that is not an option for you because of company policy, is that right?

Yes, we can not add localhost, IP or 127.0.0.1 Can we make consulwatch to send the calls on hostname so that TLS certificate will allow the calls?

dnephin commented 2 years ago

Yes, I opened #11683 to track that change. I think we could do something like that, but I'm not sure yet what the config should look like. We can continue the discussion on that issue, if that works for you.

vishwanathjadhav commented 2 years ago

@dnephin - Cool, Looks ok to me.