hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.41k stars 4.43k forks source link

Connect does not update CA Roots Endpoint when rotating Vault CA #8681

Open lawliet89 opened 4 years ago

lawliet89 commented 4 years ago

Overview of the Issue

I recently converted Connect CA to use Vault from the internal CA. I am using TLS with Auto Encrypt turned on. The cluster is running on Kubernetes with the official Consul Helm chart. That is, the agents are using the certificates issued by Consul Connect.

I noticed immediately that the certificates issued to agents were crossed signed in a particularly confusing manner (as a human to grok). However, it seemed like most tools were able to validate the certificates fine.

Two weeks later, when the cross signed intermediate certificate expired, I notice that the agents were still serving the expired intermediate certificates. This led to tools that depended on Consul API < 1.4.0 to fail to validate the certificate chain, with issues like https://github.com/hashicorp/consul-esm/issues/84

This morning, I noticed that all my workloads that depended on Consul agents were crashing.

For posterity, this was posted on 2020-09-14 2350 UTC

Here's what I found:

  1. Vault CA expires some time on 2020-09-14

  2. Consul rotated the Vault root CA some time on 2020-09-04.

  3. It did not update the new CA in the /v1/connect/ca/roots endpoint! The output below was obtained at about 2020-09-04 2130 UTC.

    {
      "ActiveRootID": "cf:15:4d:4e:ee:06:da:fc:c0:7c:a3:53:75:c4:cc:ae:0e:a3:aa:e1",
      "TrustDomain": "ca41bf4d-01f3-3421-42b6-04767a43ad66.consul",
      "Roots": [
        {
          "ID": "cf:15:4d:4e:ee:06:da:fc:c0:7c:a3:53:75:c4:cc:ae:0e:a3:aa:e1",
          "Name": "Vault CA Root Cert",
          "SerialNumber": 1707250713833350726,
          "SigningKeyID": "e0:9d:fd:f1:3f:33:b7:e4:f6:a4:14:f3:32:e5:72:11:a3:2a:57:c9",
          "ExternalTrustDomain": "ca41bf4d-01f3-3421-42b6-04767a43ad66",
          "NotBefore": "2020-08-13T07:37:00Z",
          "NotAfter": "2020-09-14T07:37:30Z",
          "RootCert": "-----BEGIN CERTIFICATE-----\nMIICLDCCAdKgAwIBAgIUMJDd91g5C8H7s/dFF7FfepSRpkYwCgYIKoZIzj0EAwIw\nLzEtMCsGA1UEAxMkcHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3Vs\nMB4XDTIwMDgxMzA3MzcwMFoXDTIwMDkxNDA3MzczMFowLzEtMCsGA1UEAxMkcHJp\nLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMFkwEwYHKoZIzj0CAQYI\nKoZIzj0DAQcDQgAEGgXmb56U55zmAmQINZ5saHFdDyQQaG75UHaBLlh5t+268UEy\n4c6dNedNsuSJ4OH9gkx4ngk84+LfKKwsQr7V5qOByzCByDAOBgNVHQ8BAf8EBAMC\nAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQU4J398T8zt+T2pBTzMuVyEaMq\nV8kwHwYDVR0jBBgwFoAU4J398T8zt+T2pBTzMuVyEaMqV8kwZQYDVR0RBF4wXIIk\ncHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VshjRzcGlmZmU6Ly9j\nYTQxYmY0ZC0wMWYzLTM0MjEtNDJiNi0wNDc2N2E0M2FkNjYuY29uc3VsMAoGCCqG\nSM49BAMCA0gAMEUCIH7pQq454KkEw0O08R9v90gccyZjNKlio4qnGwsLsbsTAiEA\ntBfCZhuKNkhhZ1KpulORakC88w9OcmTmPdsqPErOH7w=\n-----END CERTIFICATE-----",
          "IntermediateCerts": [
            "-----BEGIN CERTIFICATE-----\nMIICVzCCAf2gAwIBAgIEBGEMEjAKBggqhkjOPQQDAjAdMRswGQYDVQQDExJDb25z\ndWwgQ0EgNzM0Njg5NDQwHhcNMjAwODEzMDczODAwWhcNMjAwODIwMDczODAwWjAv\nMS0wKwYDVQQDEyRwcmktZTFxMmp6bS52YXVsdC5jYS5jYTQxYmY0ZC5jb25zdWww\nWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQaBeZvnpTnnOYCZAg1nmxocV0PJBBo\nbvlQdoEuWHm37brxQTLhzp01502y5Ing4f2CTHieCTzj4t8orCxCvtXmo4IBFzCC\nARMwDgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFOCd\n/fE/M7fk9qQU8zLlchGjKlfJMGoGA1UdIwRjMGGAXzdjOmZhOjVhOjhiOjI3OmMx\nOmY3OmM5OjE0OmI4OjlhOjI2OmQ2OjE2OmUyOjFjOmZhOjI4Ojg2OmU0OjkyOjEw\nOmRmOjM3OjljOjBmOjAzOmQ5OmY0OmM4OmZlOjNlMGUGA1UdEQReMFyCJHByaS1l\nMXEyanptLnZhdWx0LmNhLmNhNDFiZjRkLmNvbnN1bIY0c3BpZmZlOi8vY2E0MWJm\nNGQtMDFmMy0zNDIxLTQyYjYtMDQ3NjdhNDNhZDY2LmNvbnN1bDAKBggqhkjOPQQD\nAgNIADBFAiB5njM7NXdX+/GTe/taAjvtE55JZ0E7VL9wsdx1UxEtQAIhAJ18p7BB\nmUN66M4gqYm00P6MDhqBZdLv/r5VS0AxWtxc\n-----END CERTIFICATE-----\n",
            "-----BEGIN CERTIFICATE-----\nMIICLDCCAdKgAwIBAgIUFy7MEUQIuddLzkH6vshmvGhQQWQwCgYIKoZIzj0EAwIw\nLzEtMCsGA1UEAxMkcHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3Vs\nMB4XDTIwMDgxMzA3MzgzMFoXDTIwMDkxNDA3MzkwMFowLzEtMCsGA1UEAxMkcHJp\nLXdtamliZmUudmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMFkwEwYHKoZIzj0CAQYI\nKoZIzj0DAQcDQgAEgN73dI4JV+74ov52XuyXDFdOX3hcWzDbRx0Cxb2MKQShTgw1\no1Aj6zLkRIUClEWVeWeYQ8L6a5vWOMTdqMWIWqOByzCByDAOBgNVHQ8BAf8EBAMC\nAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUFKaZdLolDLE4OeqFIr/E79xE\nkrQwHwYDVR0jBBgwFoAU4J398T8zt+T2pBTzMuVyEaMqV8kwZQYDVR0RBF4wXIIk\ncHJpLXdtamliZmUudmF1bHQuY2EuY2E0MWJmNGQuY29uc3VshjRzcGlmZmU6Ly9j\nYTQxYmY0ZC0wMWYzLTM0MjEtNDJiNi0wNDc2N2E0M2FkNjYuY29uc3VsMAoGCCqG\nSM49BAMCA0gAMEUCIHmOZBR8cNRdsqnFtUilVkj4ypReXC1rhoGlLrf7CIRFAiEA\nooC58B56lyr6YAgkW3WYeHDadhP5TepAEC2pmIZYDS0=\n-----END CERTIFICATE-----"
          ],
          "Active": true,
          "PrivateKeyType": "ec",
          "PrivateKeyBits": 256,
          "CreateIndex": 192967067,
          "ModifyIndex": 193005824
        }
      ]
    }
  4. All my workloads were seeing the old CA and could not validate the new certificate signed by the new CA.

Example Certificate issued by the new Vault CA ``` -----BEGIN CERTIFICATE----- MIICujCCAmCgAwIBAgIUGxGEkMbtHG46/f7jANEZGcvLNt8wCgYIKoZIzj0EAwIwLzEtMCsGA1UEAxMkcHJpLTFsMHNvNW4udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMB4XDTIwMDkxNDIxNDUxMloXDTIwMDkxNzIxNDU0MlowQDE+MDwGA1UEAxM1Z2tlc3RhZ2luZ2NvcmVjb3JlZTI3YmM1NTFmNnZrOWEuYWdudC5kdW1teS50ci5jb25zdWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATN8SCaBCUr06xW5vfC0LwHWs+hemjFVGMC2tGEbjKGRVaq5aHNl51Vpyp1s83HpRWKF+se7QFQqBSlM09VL2cwo4IBRzCCAUMwDgYDVR0PAQH/BAQDAgOoMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEFBQcDAjAdBgNVHQ4EFgQUK67RdRZkLvr4GZkD8M7GPEEkkDcwHwYDVR0jBBgwFoAUmCbYHWTDSkd7EZ9DjfimaaPYaA4wgdEGA1UdEQSByTCBxoI1Z2tlc3RhZ2luZ2NvcmVjb3JlZTI3YmM1NTFmNnZrOWEuYWdudC5kdW1teS50ci5jb25zdWyCCWxvY2FsaG9zdIcEfwAAAYcQAAAAAAAAAAAAAAAAAAAAAYcECgEBF4Zkc3BpZmZlOi8vZHVtbXkudHJ1c3Rkb21haW4vYWdlbnQvY2xpZW50L2RjL2FzaWEtc291dGhlYXN0MS9pZC9na2Utc3RhZ2luZy1jb3JlLWNvcmUtZTItN2JjNTUxZjYtdms5YTAKBggqhkjOPQQDAgNIADBFAiEAyQ8iCCyWaXLizhTFsL6ZHzylsCfiE1LNziDGE3OfhNMCIF1rLp5cz3MwM0yNb3PVyCBxQrooZBWw5/WL40BpR72r -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIICLDCCAdKgAwIBAgIULmbuc0rpfetw+1U1/l5wgRZLVtMwCgYIKoZIzj0EAwIwLzEtMCsGA1UEAxMkcHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMB4XDTIwMDkwMzIwMjU0MVoXDTIwMTAwNTIwMjYxMVowLzEtMCsGA1UEAxMkcHJpLTFsMHNvNW4udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEb1EVEBEq54AY6kqWRV4K0dyDUdz5hpZDkTBeKjm572go2/TbHxSsyO7XXkNatQulP9HDKsT+ssPqCfgnLj1wsqOByzCByDAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUmCbYHWTDSkd7EZ9DjfimaaPYaA4wHwYDVR0jBBgwFoAU4J398T8zt+T2pBTzMuVyEaMqV8kwZQYDVR0RBF4wXIIkcHJpLTFsMHNvNW4udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VshjRzcGlmZmU6Ly9jYTQxYmY0ZC0wMWYzLTM0MjEtNDJiNi0wNDc2N2E0M2FkNjYuY29uc3VsMAoGCCqGSM49BAMCA0gAMEUCIQCyywIOd03fP9iTAHF5qDGO7zvbsNO+EnTJ2F0yVyxFhgIgEEZDVPf+bFfuDZ2LfYpADze7GLB2EXbfMyklB4s6iI4= -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIICVzCCAf2gAwIBAgIEBGEMEjAKBggqhkjOPQQDAjAdMRswGQYDVQQDExJDb25zdWwgQ0EgNzM0Njg5NDQwHhcNMjAwODEzMDczODAwWhcNMjAwODIwMDczODAwWjAvMS0wKwYDVQQDEyRwcmktZTFxMmp6bS52YXVsdC5jYS5jYTQxYmY0ZC5jb25zdWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQaBeZvnpTnnOYCZAg1nmxocV0PJBBobvlQdoEuWHm37brxQTLhzp01502y5Ing4f2CTHieCTzj4t8orCxCvtXmo4IBFzCCARMwDgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFOCd/fE/M7fk9qQU8zLlchGjKlfJMGoGA1UdIwRjMGGAXzdjOmZhOjVhOjhiOjI3OmMxOmY3OmM5OjE0OmI4OjlhOjI2OmQ2OjE2OmUyOjFjOmZhOjI4Ojg2OmU0OjkyOjEwOmRmOjM3OjljOjBmOjAzOmQ5OmY0OmM4OmZlOjNlMGUGA1UdEQReMFyCJHByaS1lMXEyanptLnZhdWx0LmNhLmNhNDFiZjRkLmNvbnN1bIY0c3BpZmZlOi8vY2E0MWJmNGQtMDFmMy0zNDIxLTQyYjYtMDQ3NjdhNDNhZDY2LmNvbnN1bDAKBggqhkjOPQQDAgNIADBFAiB5njM7NXdX+/GTe/taAjvtE55JZ0E7VL9wsdx1UxEtQAIhAJ18p7BBmUN66M4gqYm00P6MDhqBZdLv/r5VS0AxWtxc -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIICLDCCAdKgAwIBAgIUFy7MEUQIuddLzkH6vshmvGhQQWQwCgYIKoZIzj0EAwIwLzEtMCsGA1UEAxMkcHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMB4XDTIwMDgxMzA3MzgzMFoXDTIwMDkxNDA3MzkwMFowLzEtMCsGA1UEAxMkcHJpLXdtamliZmUudmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEgN73dI4JV+74ov52XuyXDFdOX3hcWzDbRx0Cxb2MKQShTgw1o1Aj6zLkRIUClEWVeWeYQ8L6a5vWOMTdqMWIWqOByzCByDAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUFKaZdLolDLE4OeqFIr/E79xEkrQwHwYDVR0jBBgwFoAU4J398T8zt+T2pBTzMuVyEaMqV8kwZQYDVR0RBF4wXIIkcHJpLXdtamliZmUudmF1bHQuY2EuY2E0MWJmNGQuY29uc3VshjRzcGlmZmU6Ly9jYTQxYmY0ZC0wMWYzLTM0MjEtNDJiNi0wNDc2N2E0M2FkNjYuY29uc3VsMAoGCCqGSM49BAMCA0gAMEUCIHmOZBR8cNRdsqnFtUilVkj4ypReXC1rhoGlLrf7CIRFAiEAooC58B56lyr6YAgkW3WYeHDadhP5TepAEC2pmIZYDS0= -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIICLDCCAdKgAwIBAgIULmbuc0rpfetw+1U1/l5wgRZLVtMwCgYIKoZIzj0EAwIwLzEtMCsGA1UEAxMkcHJpLWUxcTJqem0udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMB4XDTIwMDkwMzIwMjU0MVoXDTIwMTAwNTIwMjYxMVowLzEtMCsGA1UEAxMkcHJpLTFsMHNvNW4udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VsMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEb1EVEBEq54AY6kqWRV4K0dyDUdz5hpZDkTBeKjm572go2/TbHxSsyO7XXkNatQulP9HDKsT+ssPqCfgnLj1wsqOByzCByDAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUmCbYHWTDSkd7EZ9DjfimaaPYaA4wHwYDVR0jBBgwFoAU4J398T8zt+T2pBTzMuVyEaMqV8kwZQYDVR0RBF4wXIIkcHJpLTFsMHNvNW4udmF1bHQuY2EuY2E0MWJmNGQuY29uc3VshjRzcGlmZmU6Ly9jYTQxYmY0ZC0wMWYzLTM0MjEtNDJiNi0wNDc2N2E0M2FkNjYuY29uc3VsMAoGCCqGSM49BAMCA0gAMEUCIQCyywIOd03fP9iTAHF5qDGO7zvbsNO+EnTJ2F0yVyxFhgIgEEZDVPf+bFfuDZ2LfYpADze7GLB2EXbfMyklB4s6iI4= -----END CERTIFICATE----- ```

I have since migrated back to Connect internal CA and the certificates look fine now. The information requested and gathered below might not be useful. I did not have time to stop and collect information while everything was on fire.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Enable Consul Connect
  2. Migrate to Vault CA
  3. Wait for Vault CA to be close to expiry for Connect to rotate the CA
  4. Check /v1/connect/ca/roots

Consul info for both Client and Server

Client info ``` agent: check_monitors = 0 check_ttls = 1 checks = 1 services = 1 build: prerelease = revision = a9322b9c version = 1.8.3 consul: acl = disabled known_servers = 5 server = false runtime: arch = amd64 cpu_count = 4 goroutines = 162 max_procs = 4 os = linux version = go1.14.7 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 237 failed = 0 health_score = 0 intent_queue = 0 left = 1 member_time = 112943 members = 20 query_queue = 0 query_time = 2 ```
Server info ``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = a9322b9c version = 1.8.3 consul: acl = disabled bootstrap = false known_datacenters = 1 leader = false leader_addr = 172.19.7.22:8300 server = true raft: applied_index = 207189814 commit_index = 207189814 fsm_pending = 0 last_contact = 78.699669ms last_log_index = 207189814 last_log_term = 1292 last_snapshot_index = 207177581 last_snapshot_term = 1284 latest_configuration = [{Suffrage:Voter ID:20c563f8-1f25-df37-f33a-4fc3691f1542 Address:172.19.7.22:8300} {Suffrage:Voter ID:a1c5efa1-bfe1-370d-d698-43ac7d6f3601 Address:172.19.11.7:8300} {Suffrage:Voter ID:cdc55cb7-d517-15f1-d960-29933fb18568 Address:172.19.4.69:8300} {Suffrage:Voter ID:fe7364e2-1c64-0b3c-b99c-2b6c06b85048 Address:172.19.0.9:8300} {Suffrage:Voter ID:c316f35f-65e5-4007-4f34-7398456c0da9 Address:172.19.9.100:8300}] latest_configuration_index = 0 num_peers = 4 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 1292 runtime: arch = amd64 cpu_count = 4 goroutines = 244 max_procs = 4 os = linux version = go1.14.7 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 237 failed = 0 health_score = 0 intent_queue = 0 left = 1 member_time = 112947 members = 20 query_queue = 0 query_time = 2 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 2239 members = 5 query_queue = 0 query_time = 1 ```

Operating system and Environment details

mariusehr1 commented 3 years ago

Hello,

Can you confirm that the issue didn't occur again after you swapped back to built-in CA ?

Thanks,

Marius.

lawliet89 commented 3 years ago

@mariusehr1 Yes I can confirm that.

I'm not likely to switch back to Vault CA because the friction and potential for issues seem to be too high for my liking.

mariusehr1 commented 3 years ago

Ok , good to know.

I am experiencing the same issue at the moment, im switching back to built-in , hopping for a change.

Just out of curiosity : are you using consul in production ? and if so , for how long as it been working?

Thanks Marius