hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.12k stars 4.21k forks source link

Consul backend with TLS: Bad Certificate #4930

Closed monwolf closed 4 years ago

monwolf commented 6 years ago

Good monring, I'm trying to setup a cluster of vault (v0.10.3) using consul as backend. In this setup I have 2 tipes of consul, 1 node is the server and the other are members of this consul client. When I tried to run vault in the client node I saw this error message:

Jul 16 07:34:33 ildes01 vault: 2018-07-16T07:34:33.293+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:34 ildes01 vault: 2018-07-16T07:34:34.064+0200 [WARN ] storage.consul: check unable to talk with Consul backend: error="Put https://127.0.0.1:8500/v1/agent/check/fail/vault:10.10.0.128:8200:vault-sealed-check?note=Vault+Sealed: remote error: tls: bad certificate"

This error didn't happen on the consul server. In the next line I pasted the output of run consul members to show the state of my cluster.

consul_ssl  members
Node              Address            Status  Type    Build  Protocol  DC       Segment
des01      10.10.0.125:8301  alive   server  1.2.0  2        bardock  <all>
ildes01    10.10.0.128:8301  alive   client  1.2.0  2          bardock  <default>

I generated the SSL certificates using cfssl and cfssljson in my ansible playbook:

- name: Generate server private key and certificate
  command: >
     bash -c "echo '{\"CN\":\"{{ item }}\",\"key\":{\"algo\":\"rsa\",\"size\":2048}}' |
     cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem
     -config=cfssl.json -hostname=\"{{ item }},{{ item }}.local.com,{{ item }}.node.global.consul,server.global.nomad,localhost,127.0.0.1,{{ hostvars[item]['ansible_default_ipv4']['address'] }}\" -
     | cfssljson -bare server-{{ item }}"
  args:
    chdir: "{{ consul_ssl_dir }}"
  with_items: "{{ groups['server'] }}"
  when: consul_bootstrap

- name: Generate client private key and certificate
  command: >
     bash -c "echo '{\"CN\":\"{{ item }}\",\"key\":{\"algo\":\"rsa\",\"size\":2048}}' |
     cfssl gencert -ca=consul-ca.pem -ca-key=consul-ca-key.pem
     -config=cfssl.json -hostname=\"{{ item }},{{ item }}.local.com,{{ item }}.node.global.consul,client.global.nomad,localhost,127.0.0.1,{{ hostvars[item]['ansible_default_ipv4']['address'] }}\" -
     | cfssljson -bare client-{{ item }}"
  args:
    chdir: "{{ consul_ssl_dir }}"
  with_items:  "{{ groups['client'] }}"
  when: consul_bootstrap

If I inspect with openssl the certificates I'm able to see all de alternetivenames that I provided.

Server certificate:

X509v3 Subject Alternative Name: 
DNS:des01, DNS:des01.local.com, DNS:des01.node.global.consul, DNS:server.global.nomad, DNS:localhost, IP Address:127.0.0.1, IP Address:10.10.0.125

Client certificate:

X509v3 Subject Alternative Name: 
DNS:ildes01, DNS:ildes01.local.com, DNS:ildes01.node.global.consul, DNS:client.global.nomad, DNS:localhost, IP Address:127.0.0.1, IP Address:10.10.0.128

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create consul client and server with tls verify incoming and outgoing:

Client configuration:

{
    "server": false,
    "node_name": "ildes01",
    "enable_debug": true,
    "datacenter": "bardock",
    "data_dir": "/opt/consul/data",
    "encrypt": "XXXXXX",
    "disable_update_check": true,
    "bind_addr":"0.0.0.0",
    "advertise_addr": "10.10.0.128",
    "addresses": {
        "https": "0.0.0.0"
    },
    "ports": {
        "https": 8500,
        "http": -1
    },
    "key_file": "/opt/consul/ssl/client-ildes01-key.pem",
    "cert_file": "/opt/consul/ssl/client-ildes01.pem",
    "ca_file": "/opt/consul/ssl/consul-ca.pem",
    "verify_incoming": true,
    "verify_outgoing": true,
    "retry_join":[
        "10.10.0.125"
    ]
}

Server configuration:

{
    "bootstrap": true,
        "server": true,
        "node_name": "des01",
    "datacenter": "bardock",
    "data_dir": "/opt/consul/data",
    "encrypt": "XXXX",
    "disable_update_check": true,
    "bind_addr":"0.0.0.0",
    "advertise_addr": "10.10.0.125",
    "addresses": {
        "https": "0.0.0.0"
    },
    "ports": {
        "https": 8500,
        "http": -1
    },
    "key_file": "/opt/consul/ssl/server-des01-key.pem",
    "cert_file": "/opt/consul/ssl/server-des01.pem",
    "ca_file": "/opt/consul/ssl/consul-ca.pem",
    "verify_incoming": true,
    "verify_outgoing": true,
    "retry_join":[
        "10.10.0.125"
    ]
}
  1. Create vault configuration on each node:

Server config:

storage "consul" {
  address = "127.0.0.1:8500"
  path = "vault/"
  token = "XXXX"
  scheme = "https"
  tls_skip_verify = 0
    tls_cert_file = "/opt/consul/ssl/server-des01.pem"
  tls_key_file = "/opt/consul/ssl/server-des01-key.pem"
    tls_ca_file = "/opt/consul/ssl/consul-ca.pem"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address  = "10.10.0.125:8201"
  tls_disable = 0
    tls_cert_file = "/opt/consul/ssl/server-des01.pem"
  tls_key_file = "/opt/consul/ssl/server-des01-key.pem"
  }

api_addr = "https://10.10.0.125:8200"
cluster_addr = "https://10.10.0.125:8201"

ui=true

Client config:

storage "consul" {
  address = "127.0.0.1:8500"
  path = "vault/"
  token = "XXXXX"
  scheme = "https"
  tls_skip_verify = 0
    tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_cert_file = "/opt/consul/ssl/client-ildes01-key.pem"
    tls_ca_file = "/opt/consul/ssl/consul-ca.pem"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address  = "10.10.0.128:8201"
  tls_disable = 0
    tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_key_file = "/opt/consul/ssl/client-ildes01-key.pem"
  }

api_addr = "https://10.10.0.128:8200"
cluster_addr = "https://10.10.0.128:8201"
  1. Start vault:

/usr/bin/vault server -config=/opt/vault/conf

Log Fragments

After run vault in the client node I saw this logs:

Jul 16 07:34:29 ildes01 systemd: Started Vault Service.
Jul 16 07:34:29 ildes01 systemd: Starting Vault Service...
Jul 16 07:34:29 ildes01 vault: ==> Vault server configuration:
Jul 16 07:34:29 ildes01 vault: Api Address: https://10.10.0.128:8200
Jul 16 07:34:29 ildes01 vault: Cgo: disabled
Jul 16 07:34:29 ildes01 vault: Cluster Address: https://10.10.0.128:8201
Jul 16 07:34:29 ildes01 vault: Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "10.10.0.128:8201", tls: "enabled")
Jul 16 07:34:29 ildes01 vault: Log Level: info
Jul 16 07:34:29 ildes01 vault: Mlock: supported: true, enabled: true
Jul 16 07:34:29 ildes01 vault: Storage: consul (HA available)
Jul 16 07:34:29 ildes01 vault: Version: Vault v0.10.3
Jul 16 07:34:29 ildes01 vault: Version Sha: c69ae68faf2bf7fc1d78e3ec62655696a07454c7
Jul 16 07:34:29 ildes01 vault: ==> Vault server started! Log data will stream in below:
Jul 16 07:34:29 ildes01 vault: 2018-07-16T07:34:29.213+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:30 ildes01 vault: 2018-07-16T07:34:30.231+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:31 ildes01 vault: 2018-07-16T07:34:31.250+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:32 ildes01 vault: 2018-07-16T07:34:32.272+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:33 ildes01 vault: 2018-07-16T07:34:33.293+0200 [WARN ] storage.consul: reconcile unable to talk with Consul backend: error="service registration failed: Put https://127.0.0.1:8500/v1/agent/service/register: remote error: tls: bad certificate"
Jul 16 07:34:34 ildes01 vault: 2018-07-16T07:34:34.064+0200 [WARN ] storage.consul: check unable to talk with Consul backend: error="Put https://127.0.0.1:8500/v1/agent/check/fail/vault:10.10.0.128:8200:vault-sealed-check?note=Vault+Sealed: remote error: tls: bad certificate"

May be I need some other SAN or flag in the certificate? I spend few hours reviewing your documentation for my alls seems good, but It don't start. Could you help me with this issue?

jefferai commented 6 years ago

The Consul logs will likely have a more detailed explanation of the problem.

monwolf commented 6 years ago

Good monring thanks for the advise, I tried to execute consul in "trace mode" but I'm not able to see anything wrong:

 /usr/bin/consul agent -config-dir=/opt/consul/conf -log-level=trace
WARNING: LAN keyring exists but -encrypt given, using keyring
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.2.0'
           Node ID: 'c6e560ae-551c-1dc9-41f6-aaaed240cff3'
         Node name: 'ildes01'
        Datacenter: 'bardock' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [127.0.0.1] (HTTP: -1, HTTPS: 8500, DNS: 8600)
      Cluster Addr: 10.10.0.128 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true

==> Log data will now stream in as it occurs:

    2018/07/17 07:42:36 [INFO] serf: EventMemberJoin: ildes01 10.10.0.128
    2018/07/17 07:42:36 [DEBUG] agent: restored service definition "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" from "/opt/consul/data/services/1eeec430722bec3dc8bc18122a17917c"
    2018/07/17 07:42:36 [DEBUG] agent: restored service definition "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" from "/opt/consul/data/services/dbca3984643c6d3aaabc42121670215d"
    2018/07/17 07:42:36 [DEBUG] agent: restored health check "9e71d1d465ef90c6d1ce95ec006a390969014166" from "/opt/consul/data/checks/052736bd31672306e8254efc01cfc810"
    2018/07/17 07:42:36 [DEBUG] agent/proxy: managed Connect proxy manager started
    2018/07/17 07:42:36 [WARN] agent/proxy: running as root, will not start managed proxies
    2018/07/17 07:42:36 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2018/07/17 07:42:36 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2018/07/17 07:42:36 [INFO] agent: Started HTTPS server on [::]:8500 (tcp)
    2018/07/17 07:42:36 [INFO] agent: started state syncer
    2018/07/17 07:42:36 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer triton
    2018/07/17 07:42:36 [INFO] agent: Joining LAN cluster...
    2018/07/17 07:42:36 [INFO] agent: (LAN) joining: [10.10.0.125]
    2018/07/17 07:42:36 [WARN] manager: No servers available
    2018/07/17 07:42:36 [ERR] agent: failed to sync remote state: No known Consul servers
    2018/07/17 07:42:36 [DEBUG] memberlist: Initiating push/pull sync with: 10.10.0.125:8301
    2018/07/17 07:42:36 [WARN] memberlist: Refuting a suspect message (from: ildes01)
    2018/07/17 07:42:36 [INFO] serf: EventMemberJoin: des01 10.10.0.125
    2018/07/17 07:42:36 [DEBUG] serf: Refuting an older leave intent
    2018/07/17 07:42:36 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2018/07/17 07:42:36 [DEBUG] agent: systemd notify failed: No socket
    2018/07/17 07:42:36 [INFO] agent: Join LAN completed. Synced with 1 initial agents
    2018/07/17 07:42:36 [INFO] consul: adding server des01 (Addr: tcp/10.10.0.125:8300) (DC: bardock)
    2018/07/17 07:42:36 [DEBUG] http: Request GET /v1/kv/config/openid-server.properties?recurse&wait=55s&index=110507 (22.661137ms) from=172.17.0.2:54594
    2018/07/17 07:42:36 [DEBUG] http: Request GET /v1/kv/config/openid-server.yaml?recurse&wait=55s&index=110507 (1.256058ms) from=172.17.0.2:54598
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:36 [DEBUG] serf: messageJoinType: ildes01
    2018/07/17 07:42:37 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
    2018/07/17 07:42:37 [INFO] agent: Synced service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b"
    2018/07/17 07:42:37 [INFO] agent: Synced service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4"
    2018/07/17 07:42:37 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:37 [DEBUG] agent: Service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" in sync
    2018/07/17 07:42:37 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:38 [DEBUG] agent: Check "9e71d1d465ef90c6d1ce95ec006a390969014166" is passing
    2018/07/17 07:42:38 [DEBUG] agent: Service "_nomad-task-uya2ltnegrulmybmhl6e7f3krkqypb2b" in sync
    2018/07/17 07:42:38 [DEBUG] agent: Service "_nomad-client-iisg2jfy4ykv57yhf2oxzqrrxfdeghy4" in sync
    2018/07/17 07:42:38 [INFO] agent: Synced check "9e71d1d465ef90c6d1ce95ec006a390969014166"
    2018/07/17 07:42:38 [DEBUG] agent: Node info in sync
    2018/07/17 07:42:38 [DEBUG] memberlist: Stream connection from=10.10.0.130:32429
    2018/07/17 07:42:43 [DEBUG] memberlist: Stream connection from=10.10.0.127:61038
monwolf commented 6 years ago

Sorry for the delay, I was on holiday. I've been able to discover the issue, I had a typo in my config file:

  tls_cert_file = "/opt/consul/ssl/client-ildes01.pem"
  tls_cert_file = "/opt/consul/ssl/client-ildes01-key.pem"

I doubled the property tls_cert_file without set tls_cert_file . I think this behaviour could be "handled" by the application and show a warning message when you put a certificate without key.