hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

BAD CERTIFICATE with consul connect #16617

Open fred-gb opened 1 year ago

fred-gb commented 1 year ago

Hi 👋🏻

Overview of the Issue

I tried to create hashistack with ACL and TLS on single node for now. When I launch a test job with consul connect won't work

I posted issue here after this thread on discuss: Nomad discuss after it is advised to post on Consul forum.

Consul 1.15.1 Nomad 1.5

I see in changes what it needed to work. But I don't know how to apply with Nomad job deploy.

I'm not sure about which from Consul or Nomad come from issue


Reproduction Steps

Create a single node Hashistack with Consul 1.15.1 Vault 1.13 (consul backend) Nomad 1.5 .

Launch job with first Sidecar:

job "mosquitto" {
  region = "global"
  datacenters = ["dc1"]
  type = "service"

  group "mosquitto" {

    count = 1

    restart {
      attempts = 10
      interval = "5m"
      delay = "10s"
      mode = "delay"
    }

    network {
      mode = "bridge"

        port "mqtt" {
        to = 1883
        static = 1883
      }
    }

    service {
      name = "mqtt"
      port = "1883"

      connect {
        sidecar_service {}

        sidecar_task {
          resources {
            cpu    = 64
            memory = 64
          }
        }
      }
    }

    task "mosquitto" {
      driver = "docker"

      config {
        image = "eclipse-mosquitto:latest"

        mount {
          type = "bind"
          target = "/mosquitto/config/mosquitto.conf"
          source = "local/mosquitto.conf"
          readonly = false
          bind_options {
            propagation = "rshared"
          }
        }

        ports = ["mqtt"]
      }

      template {
        data = <<EOH
listener 1883
allow_anonymous true
EOH
        destination = "local/mosquitto.conf"
      }

      template {
        data = <<EOH
ANSIBLE_FORCE_COLOR=TRUE

EOH
        destination = "secrets/file.env"
        env         = true
      }

      resources {
        cpu    = 1024
        memory = 1024
      }
    }
  }
}

and second job to connect to sidecar:

job "tester" {
  region = "global"
  datacenters = ["dc1"]
  type = "service"

  group "tester" {

    count = 1

    restart {
      attempts = 10
      interval = "5m"
      delay = "10s"
      mode = "delay"
    }

    network {
      mode = "bridge"
    }

    service {
      name = "mesh"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "mqtt"
              local_bind_port  = "1883"
            }
          }
        }
        sidecar_task {
          resources {
            cpu    = 16
            memory = 16
          }
        }
      }
    }

    task "tester" {
      driver = "docker"

      config {
        image = "alpine:latest"
        entrypoint = ["/bin/sleep", "3600"]
      }

      resources {
        cpu    = 128
        memory = 128
      }
    }
  }
}

Consul info for both Client and Server

Consul Info:

agent:
    check_monitors = 0
    check_ttls = 1
    checks = 9
    services = 10
build:
    prerelease = 
    revision = 7c04b6a0
    version = 1.15.1
    version_metadata = 
consul:
    acl = enabled
    bootstrap = true
    known_datacenters = 1
    leader = true
    leader_addr = 192.168.64.69:8300
    server = true
raft:
    applied_index = 18146
    commit_index = 18146
    fsm_pending = 0
    last_contact = 0
    last_log_index = 18146
    last_log_term = 10
    last_snapshot_index = 16384
    last_snapshot_term = 10
    latest_configuration = [{Suffrage:Voter ID:8f79ed11-c2e9-aefe-796c-a216b9c08055 Address:192.168.64.69:8300}]
    latest_configuration_index = 0
    num_peers = 0
    protocol_version = 3
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Leader
    term = 10
runtime:
    arch = arm64
    cpu_count = 2
    goroutines = 243
    max_procs = 2
    os = linux
    version = go1.20.1
serf_lan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 1
    event_time = 10
    failed = 0
    health_score = 0
    intent_queue = 1
    left = 0
    member_time = 10
    members = 1
    query_queue = 0
    query_time = 1
serf_wan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 1
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 1
    members = 1
    query_queue = 0
    query_time = 1

Client and Server HCL config (single node)

{
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enable_token_persistence": true,
        "enabled": true,
        "token_ttl": "30s",
        "tokens": {
            "initial_management": "dcdac9a4-e224-5b59-b9dc-2f6bfb55362e",
            "replication": "cfbb5111-31ff-5954-8ec7-8f561bab8c67"
        }
    },
    "addresses": {
        "dns": "0.0.0.0",
        "grpc_tls": "0.0.0.0",
        "http": "0.0.0.0",
        "https": "0.0.0.0"
    },
    "advertise_addr": "192.168.64.69",
    "advertise_addr_wan": "192.168.64.69",
    "auto_encrypt": {},
    "autopilot": {
        "cleanup_dead_servers": false,
        "last_contact_threshold": "200ms",
        "max_trailing_logs": 250,
        "server_stabilization_time": "10s"
    },
    "bind_addr": "192.168.64.69",
    "bootstrap": false,
    "bootstrap_expect": 1,
    "client_addr": "127.0.0.1",
    "connect": {
        "enabled": true
    },
    "data_dir": "/opt/consul",
    "datacenter": "dc1",
    "disable_update_check": false,
    "domain": "consul",
    "enable_local_script_checks": false,
    "enable_script_checks": false,
    "encrypt": "wfuQxs/nL0zNgFtJ54JxK+V+k3aTGBGO9G0PPsVPPDY=",
    "encrypt_verify_incoming": true,
    "encrypt_verify_outgoing": true,
    "log_file": "/var/log/consul/consul.log",
    "log_level": "INFO",
    "log_rotate_bytes": 0,
    "log_rotate_duration": "24h",
    "log_rotate_max_files": 0,
    "performance": {
        "leave_drain_time": "5s",
        "raft_multiplier": 1,
        "rpc_hold_timeout": "7s"
    },
    "ports": {
        "dns": 8600,
    "grpc": 8502,
        "grpc_tls": 8503,
        "http": -1,
        "https": 8501,
        "serf_lan": 8301,
        "serf_wan": 8302,
        "server": 8300
    },
    "primary_datacenter": "dc1",
    "raft_protocol": 3,
    "retry_interval": "30s",
    "retry_interval_wan": "30s",
    "retry_join": [
        "192.168.64.69"
    ],
    "retry_max": 0,
    "retry_max_wan": 0,
    "server": true,
    "tls": {
        "defaults": {
            "ca_file": "/etc/ssl/hashistack/hashistack-ca.pem",
            "cert_file": "/etc/ssl/hashistack/dc1-server-consul.pem",
            "key_file": "/etc/ssl/hashistack/dc1-server-consul.key",
            "tls_min_version": "TLSv1_2",
            "verify_incoming": true,
            "verify_outgoing": true
        },
        "https": {
            "verify_incoming": false
        },
        "internal_rpc": {
            "verify_incoming": true,
            "verify_server_hostname": true
        }
    },
    "translate_wan_addrs": false,
    "ui_config": {
        "enabled": true
    }
}

According to this docs: Nomad Consul Connect integration

Nomad conf:

consul {
    # The address to the Consul agent.
    address      = "127.0.0.1:8501"     
    grpc_address = "127.0.0.1:8503"
    ssl = true
    grpc_ca_file = "/etc/ssl/hashistack/hashistack-ca.pem"
    ca_file = "/etc/ssl/hashistack/hashistack-ca.pem"
    cert_file = "/etc/ssl/hashistack/dc1-server-consul.pem"
    key_file = "/etc/ssl/hashistack/dc1-server-consul.key"
    token = "ebfb82e3-1d84-95d3-22d0-269b427136fb"
    # The service name to register the server and client with Consul.
    server_service_name = "nomad-servers"
    client_service_name = "nomad-clients"
    tags = {}

    # Enables automatically registering the services.
    auto_advertise = true

    # Enabling the server and client to bootstrap using Consul.
    server_auto_join = true
    client_auto_join = true
}

Operating system and Environment details

Ubuntu 22.04 (in VM for testing)

Log Fragments

In Nomad UI:

[2023-03-11 09:41:49.044][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 376s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:42:02.143][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 389s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:42:15.919][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 403s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:42:32.197][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 419s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:42:59.585][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 447s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:43:21.912][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 469s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:43:48.853][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 496s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[2023-03-11 09:44:06.962][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 514s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

I created certs with openssl and ansible. It works. Without launch job, I have no error in communications between each components of Hashistack.

Help! 🆘

fred-gb commented 1 year ago

Found a workaround, with this topic: Discuss Hashicorp

In consul config:

tls {
        "grpc": {
                "verify_incoming": false
        }
[...]

But, I don't really understand if exists a solution for Consul 1.15+

Thanks

fred-gb commented 1 year ago

After many tries. No more functionnal.

I tried to create a separate CA and cert.

tls = {
  defaults = {
    ca_file = "/etc/ssl/hashistack-ca.pem"
    cert_file = "/etc/ssl/dc1-server-consul.pem"
    key_file = "/etc/ssl/dc1-server-consul.key"
    tls_min_version = "TLSv1_2"
    verify_incoming = true
    verify_outgoing = true
  }
  grpc = {
    ca_file = "/etc/ssl/envoy-ca.pem"
    cert_file = "/etc/ssl/dc1-server-envoy.pem"
    key_file = "/etc/ssl/dc1-server-envoy.key"
  }
  https = {
    "verify_incoming" = false
  }
  internal_rpc = {
    verify_incoming = true
    verify_server_hostname = true
  }
}

Error message change! 🥳

Now I have:

[2023-03-17 07:54:56.493][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 371s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

😢