Open mirzmaster opened 2 years ago
Hi @mirzmaster Could you share your Helm values files for both K8s clusters? Also I would ensure that you met the following pre-requisite:
Either the Helm release name for each Kubernetes cluster must be unique, or global.name for each Kubernetes cluster must be unique to prevent collisions of ACL resources with the same prefix.
@david-yu
Ah, forgot to include these in the report.
Primary cluster values.yaml:
global:
name: consul
datacenter: mgmt
tls:
enabled: true
enableAutoEncrypt: true
acls:
manageSystemACLs: true
gossipEncryption:
autoGenerate: true
connectInject:
enabled: true
controller:
enabled: true
ui:
enabled: true
service:
type: LoadBalancer
annotations: |
service.beta.kubernetes.io/azure-dns-label-name: <DNS_NAME>
Secondary cluster values.yaml:
global:
enabled: false
name: consul
datacenter: aks01
tls:
enabled: true
enableAutoEncrypt: true
caCert:
secretName: consul-ca-cert
secretKey: tls.crt
acls:
manageSystemACLs: true
bootstrapToken:
secretName: consul-bootstrap-acl-token
secretKey: token
gossipEncryption:
secretName: consul-gossip-encryption-key
secretKey: key
connectInject:
enabled: true
externalServers:
enabled: true
# This should be any node IP of the first k8s cluster
hosts: ["10.121.16.4"]
# The node port of the UI's NodePort service
httpsPort: 31274
# This should reflect the datacenter's name as that is what will be in the TLS cert
tlsServerName: server.mgmt.consul
# The address of the kube API server of this Kubernetes cluster
k8sAuthMethodHost: <API_SERVER_ENDPOINT>
client:
enabled: true
# Requires a kubeconfig for the datacenter cluster
# Permissions in the kubeconfig should be limited to just read pods (possibly limiting to just the 'consul' namespace)
join: ["provider=k8s kubeconfig=/consul/userconfig/mgmt-kubeconfig/kubeconfig label_selector=\"app=consul,component=server\""]
extraVolumes:
- type: secret
name: mgmt-kubeconfig
load: false
I will try redeploying the client in the secondary cluster without global.name
defined.
I corrected global.datacenter
in the secondary values file to specify the DC name for the management cluster (i.e. mgmt
) and removed global.name
from the secondary values file, neither of which errors being seen.
Server log:
consul-server-1:2022-09-21T15:47:14.606Z [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {mgmt-10.121.16.105:8300 consul-server-0.mgmt <nil> 0 <nil>}. Err: connec tion error: desc = "transport: Error while dialing dial tcp <nil>->10.121.16.105:8300: operation was canceled". Reconnecting...
consul-server-2:2022-09-21T15:45:34.858Z [ERROR] agent.http: Request error: method=POST url=/v1/acl/login from=10.121.16.4:33972 error="rpc error making call: ACL not found: auth method "consul-consul-k8s-component-auth-method" not found"
consul-server-2:2022-09-21T15:45:35.952Z [ERROR] agent.http: Request error: method=POST url=/v1/acl/login from=10.121.16.4:33972 error="rpc error making call: ACL not found: auth method "consul-consul-k8s-component-auth-method" not found"
consul-server-2:2022-09-21T15:45:36.350Z [ERROR] agent.http: Request error: method=PUT url=/v1/acl/policy from=10.121.16.4:47495 error="rpc error making call: Invalid Policy: A Policy with Name "client-policy" already exists"
consul-server-2:2022-09-21T15:45:36.482Z [ERROR] agent.http: Request error: method=PUT url=/v1/acl/policy from=10.121.16.4:47495 error="rpc error making call: Invalid Policy: A Policy with Name "connect-inject-policy" already exists"
consul-server-2:2022-09-21T15:45:37.072Z [ERROR] agent.http: Request error: method=GET url=/v1/acl/token/self?stale= from=10.121.16.4:33972 error="ACL not found"
consul-server-2:2022-09-21T15:48:09.135Z [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {mgmt-10.121.16.62:8300 consul-server-1.mgmt <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.121.16.62:8300: operation was canceled". Reconnecting...
consul-server-2:2022-09-21T15:49:15.101Z [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {mgmt-10.121.16.105:8300 consul-server-0.mgmt <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.121.16.105:8300: operation was canceled". Reconnecting...
Client log:
client-acl-init 2022-09-21T15:45:34.948Z [ERROR] unable to login: error="Unexpected response code: 403 (rpc error making call: ACL not found: auth method "consul-consul-k8s-component-auth-method" not found)"
client-acl-init 2022-09-21T15:45:35.953Z [ERROR] unable to login: error="Unexpected response code: 403 (rpc error making call: ACL not found: auth method "consul-consul-k8s-component-auth-method" not found)"
client-acl-init 2022-09-21T15:45:37.071Z [INFO] Consul login complete
client-acl-init 2022-09-21T15:45:37.071Z [INFO] Checking that the ACL token exists when reading it in the stale consistency mode
client-acl-init 2022-09-21T15:45:37.072Z [ERROR] Unable to read ACL token; retrying: err="Unexpected response code: 403 (ACL not found)"
client-acl-init 2022-09-21T15:45:37.174Z [INFO] Successfully read ACL token from the server
client-acl-init 2022-09-21T15:45:37.174Z [INFO] Successfully read ACL token from the server
consul ==> Starting Consul agent...
consul Version: '1.13.1'
consul Build Date: '2022-08-11 19:07:00 +0000 UTC'
consul Node ID: 'c4b18e2b-c66c-ca69-ce19-51c64d3eb3ce'
consul Node name: 'aks-agentpool1-22038821-vmss000001'
consul Datacenter: 'mgmt' (Segment: '')
consul Server: false (Bootstrap: false)
consul Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 8600)
consul Cluster Addr: 10.121.12.37 (LAN: 8301, WAN: 8302)
consul Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: true
consul
consul ==> Log data will now stream in as it occurs:
consul
consul 2022-09-21T15:45:40.857Z [WARN] agent: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
consul 2022-09-21T15:45:40.857Z [WARN] agent: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
consul 2022-09-21T15:45:41.050Z [WARN] agent.auto_config: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
consul 2022-09-21T15:45:41.050Z [WARN] agent.auto_config: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
consul 2022-09-21T15:45:41.361Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:45:41.454Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:45:42.555Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:45:44.646Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:45:49.166Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:45:58.978Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:46:17.636Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:46:56.042Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:48:05.814Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:50:34.356Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul 2022-09-21T15:55:29.263Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
I was able to work around the auto-encrypt issue by disabling global.tls.enableAutoEncrypt
, which required that I also export the CA key Secret from the primary cluster and import to the client cluster.
This then revealed that it's the cloud auto-join which has been failing to discover the server pods.
Switching from k8s auto-join to just specifying one of the server pod's IP works, though obviously we want auto-join to discover the server pod IPs rather than having to hard code it into the client values file.
Client log, auto-encrypt enabled, k8s auto-join enabled (FAIL):
[pod/aks01-client-9wm26/client-acl-init] 2022-09-22T20:59:35.624Z [INFO] Consul login complete
[pod/aks01-client-9wm26/client-acl-init] 2022-09-22T20:59:35.624Z [INFO] Checking that the ACL token exists when reading it in the stale consistency mode
[pod/aks01-client-9wm26/client-acl-init] 2022-09-22T20:59:35.648Z [ERROR] Unable to read ACL token; retrying: err="Unexpected response code: 403 (ACL not found)"
[pod/aks01-client-9wm26/client-acl-init] 2022-09-22T20:59:35.750Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-9wm26/client-acl-init] 2022-09-22T20:59:35.750Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-9wm26/consul] ==> Starting Consul agent...
[pod/aks01-client-9wm26/consul] Version: '1.13.1'
[pod/aks01-client-9wm26/consul] Build Date: '2022-08-11 19:07:00 +0000 UTC'
[pod/aks01-client-9wm26/consul] Node ID: '08a8a368-5105-4be6-8472-4524aff49f3c'
[pod/aks01-client-9wm26/consul] Node name: 'aks-agentpool1-22038821-vmss000001'
[pod/aks01-client-9wm26/consul] Datacenter: 'mgmt' (Segment: '')
[pod/aks01-client-9wm26/consul] Server: false (Bootstrap: false)
[pod/aks01-client-9wm26/consul] Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 8600)
[pod/aks01-client-9wm26/consul] Cluster Addr: 10.121.12.65 (LAN: 8301, WAN: 8302)
[pod/aks01-client-9wm26/consul] Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: true
[pod/aks01-client-9wm26/consul]
[pod/aks01-client-9wm26/consul] ==> Log data will now stream in as it occurs:
[pod/aks01-client-9wm26/consul]
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:39.450Z [WARN] agent: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:39.450Z [WARN] agent: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:39.558Z [WARN] agent.auto_config: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:39.559Z [WARN] agent.auto_config: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:39.559Z [DEBUG] agent.auto_config: discover: Using provider "k8s"
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:40.050Z [DEBUG] agent.auto_config: discovered auto-config servers: servers=[]
[pod/aks01-client-9wm26/consul] 2022-09-22T20:59:40.050Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
Client log, auto-encrypt disabled, k8s auto-join enabled (FAIL):
[pod/aks01-client-76h8n/client-acl-init] 2022-09-22T21:23:03.348Z [INFO] Consul login complete
[pod/aks01-client-76h8n/client-acl-init] 2022-09-22T21:23:03.348Z [INFO] Checking that the ACL token exists when reading it in the stale consistency mode
[pod/aks01-client-76h8n/client-acl-init] 2022-09-22T21:23:03.350Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-76h8n/client-acl-init] 2022-09-22T21:23:03.350Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-76h8n/client-tls-init] ==> Using /consul/tls/ca/cert/tls.crt and /consul/tls/ca/key/tls.key
[pod/aks01-client-76h8n/client-tls-init] ==> Saved mgmt-client-consul-0.pem
[pod/aks01-client-76h8n/client-tls-init] ==> Saved mgmt-client-consul-0-key.pem
[pod/aks01-client-76h8n/consul] ==> Starting Consul agent...
[pod/aks01-client-76h8n/consul] Version: '1.13.1'
[pod/aks01-client-76h8n/consul] Build Date: '2022-08-11 19:07:00 +0000 UTC'
[pod/aks01-client-76h8n/consul] Node ID: 'ca4ac722-3a11-01c6-9570-111573850b33'
[pod/aks01-client-76h8n/consul] Node name: 'aks-agentpool1-22038821-vmss000001'
[pod/aks01-client-76h8n/consul] Datacenter: 'mgmt' (Segment: '')
[pod/aks01-client-76h8n/consul] Server: false (Bootstrap: false)
[pod/aks01-client-76h8n/consul] Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 8600)
[pod/aks01-client-76h8n/consul] Cluster Addr: 10.121.12.52 (LAN: 8301, WAN: 8302)
[pod/aks01-client-76h8n/consul] Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true, Auto-Encrypt-TLS: false
[pod/aks01-client-76h8n/consul]
[pod/aks01-client-76h8n/consul] ==> Log data will now stream in as it occurs:
[pod/aks01-client-76h8n/consul]
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'cert_file' field is deprecated. Use the 'tls.defaults.cert_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'key_file' field is deprecated. Use the 'tls.defaults.key_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'verify_incoming_rpc' field is deprecated. Use the 'tls.internal_rpc.verify_incoming' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.454Z [WARN] agent: The 'verify_server_hostname' field is deprecated. Use the 'tls.internal_rpc.verify_server_hostname' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'cert_file' field is deprecated. Use the 'tls.defaults.cert_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'key_file' field is deprecated. Use the 'tls.defaults.key_file' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'verify_incoming_rpc' field is deprecated. Use the 'tls.internal_rpc.verify_incoming' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.554Z [WARN] agent.auto_config: The 'verify_server_hostname' field is deprecated. Use the 'tls.internal_rpc.verify_server_hostname' field instead.
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.557Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool1-22038821-vmss000001 10.121.12.52
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.557Z [INFO] agent.router: Initializing LAN area manager
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.557Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.557Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.648Z [INFO] agent: Starting server: address=[::]:8501 network=tcp protocol=https
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.652Z [INFO] agent: Started gRPC server: address=[::]:8502 network=tcp
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.652Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.652Z [INFO] agent: Joining cluster...: cluster=LAN
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.652Z [DEBUG] agent: discover: Using provider "k8s": cluster=LAN
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.748Z [INFO] agent: started state syncer
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.748Z [INFO] agent: Consul agent running!
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.748Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.748Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.953Z [INFO] agent: Discovered servers: cluster=LAN cluster=LAN servers=
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:09.953Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error="No servers to join"
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:10.265Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:10.265Z [ERROR] agent.http: Request error: method=POST url=/v1/acl/login from=10.121.12.35:57338 error="No known Consul servers"
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:10.265Z [DEBUG] agent.http: Request finished: method=POST url=/v1/acl/login from=10.121.12.35:57338 latency=125.702µs
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:11.267Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:11.267Z [ERROR] agent.http: Request error: method=POST url=/v1/acl/login from=10.121.12.35:57338 error="No known Consul servers"
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:11.268Z [DEBUG] agent.http: Request finished: method=POST url=/v1/acl/login from=10.121.12.35:57338 latency=354.406µs
[pod/aks01-client-76h8n/consul] 2022-09-22T21:23:11.448Z [WARN] agent.router.manager: No servers available
Client log, auto-encrypt enabled, k8s auto-join disabled (SUCCESS):
[pod/aks01-client-vxb7z/client-acl-init] 2022-09-22T21:06:17.743Z [INFO] Consul login complete
[pod/aks01-client-vxb7z/client-acl-init] 2022-09-22T21:06:17.743Z [INFO] Checking that the ACL token exists when reading it in the stale consistency mode
[pod/aks01-client-vxb7z/client-acl-init] 2022-09-22T21:06:17.744Z [ERROR] Unable to read ACL token; retrying: err="Unexpected response code: 403 (ACL not found)"
[pod/aks01-client-vxb7z/client-acl-init] 2022-09-22T21:06:17.846Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-vxb7z/client-acl-init] 2022-09-22T21:06:17.846Z [INFO] Successfully read ACL token from the server
[pod/aks01-client-vxb7z/consul] ==> Starting Consul agent...
[pod/aks01-client-vxb7z/consul] Version: '1.13.1'
[pod/aks01-client-vxb7z/consul] Build Date: '2022-08-11 19:07:00 +0000 UTC'
[pod/aks01-client-vxb7z/consul] Node ID: 'a9ade670-6322-d9c7-895f-b3e5d17a2dd2'
[pod/aks01-client-vxb7z/consul] Node name: 'aks-agentpool1-22038821-vmss000001'
[pod/aks01-client-vxb7z/consul] Datacenter: 'mgmt' (Segment: '')
[pod/aks01-client-vxb7z/consul] Server: false (Bootstrap: false)
[pod/aks01-client-vxb7z/consul] Client Addr: [0.0.0.0] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 8600)
[pod/aks01-client-vxb7z/consul] Cluster Addr: 10.121.12.48 (LAN: 8301, WAN: 8302)
[pod/aks01-client-vxb7z/consul] Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: true
[pod/aks01-client-vxb7z/consul]
[pod/aks01-client-vxb7z/consul] ==> Log data will now stream in as it occurs:
[pod/aks01-client-vxb7z/consul]
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.351Z [WARN] agent: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.351Z [WARN] agent: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.450Z [WARN] agent.auto_config: The 'ca_file' field is deprecated. Use the 'tls.defaults.ca_file' field instead.
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.450Z [WARN] agent.auto_config: The 'verify_outgoing' field is deprecated. Use the 'tls.defaults.verify_outgoing' field instead.
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.450Z [DEBUG] agent.auto_config: making AutoEncrypt.Sign RPC: addr=10.121.16.24:8300
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.483Z [INFO] agent.auto_config: automatically upgraded to TLS
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.549Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool1-22038821-vmss000001 10.121.12.48
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.549Z [INFO] agent.router: Initializing LAN area manager
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.549Z [INFO] agent.auto_config: auto-config started
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [DEBUG] agent.auto_config: handling a cache update event: correlation_id=roots
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.550Z [DEBUG] agent.auto_config: roots watch fired - updating CA certificates
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [DEBUG] agent.auto_config: handling a cache update event: correlation_id=leaf
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [DEBUG] agent.auto_config: leaf certificate watch fired - updating TLS certificate
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.551Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="No known Consul servers" index=14
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.552Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.553Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.553Z [INFO] agent: Starting server: address=[::]:8501 network=tcp protocol=https
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.555Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.555Z [INFO] agent: Joining cluster...: cluster=LAN
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.555Z [INFO] agent: (LAN) joining: lan_addresses=[10.121.16.24]
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.555Z [INFO] agent: Started gRPC server: address=[::]:8502 network=tcp
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.648Z [INFO] agent: started state syncer
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.648Z [INFO] agent: Consul agent running!
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.648Z [DEBUG] agent.client.memberlist.lan: memberlist: Initiating push/pull sync with: 10.121.16.24:8301
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.648Z [WARN] agent.router.manager: No servers available
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.648Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: consul-server-0 10.121.16.24
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool1-17518502-vmss000005 10.121.16.113
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client: adding server: server="consul-server-0 (Addr: tcp/10.121.16.24:8300) (DC: mgmt)"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool1-17518502-vmss000000 10.121.16.27
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool1-17518502-vmss000003 10.121.16.51
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [WARN] agent.client.memberlist.lan: memberlist: Refuting a dead message (from: aks-agentpool1-22038821-vmss000001)
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: consul-server-1 10.121.16.117
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: consul-server-2 10.121.16.64
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [DEBUG] agent.client.serf.lan: serf: Refuting an older leave intent
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client: adding server: server="consul-server-1 (Addr: tcp/10.121.16.117:8300) (DC: mgmt)"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent: (LAN) joined: number_of_nodes=1
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [DEBUG] agent: systemd notify failed: error="No socket"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [INFO] agent.client: adding server: server="consul-server-2 (Addr: tcp/10.121.16.64:8300) (DC: mgmt)"
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.650Z [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {mgmt-10.121.16.24:8300 consul-server-0 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.121.16.24:8300: operation was canceled". Reconnecting...
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.705Z [INFO] agent: Synced node info
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.705Z [DEBUG] agent: Node info in sync
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.872Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.892Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:21.909Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:22.060Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:22.067Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:22.260Z [DEBUG] agent.client.serf.lan: serf: messageJoinType: aks-agentpool1-22038821-vmss000001
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:23.552Z [DEBUG] agent: Skipping remote check since it is managed automatically: check=serfHealth
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:23.552Z [DEBUG] agent: Node info in sync
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:24.048Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:34756 latency=92.562901ms
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:24.088Z [DEBUG] agent.http: Request finished: method=GET url=/v1/agent/services?filter=Meta%5B%22k8s-service-name%22%5D+%3D%3D+%22aks01-connect-injector%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22consul%22+and+Meta%5B%22managed-by%22%5D+%3D%3D+%22consul-k8s-endpoints-controller%22 from=10.121.12.37:35766 latency=3.03284ms
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:27.521Z [INFO] agent: Newer Consul version available: new_version=1.13.2 current_version=1.13.1
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:33.953Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:34864 latency=840.311µs
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:34.305Z [DEBUG] agent.client.memberlist.lan: memberlist: Stream connection from=10.121.16.24:34484
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:44.048Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:34972 latency=91.396936ms
[pod/aks01-client-vxb7z/consul] 2022-09-22T21:06:54.051Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:35074 latency=818.611µs
Working Helm values Server values.yaml
global:
name: consul
datacenter: mgmt
tls:
enabled: true
enableAutoEncrypt: true
acls:
manageSystemACLs: true
gossipEncryption:
autoGenerate: true
connectInject:
enabled: true
controller:
enabled: true
ui:
enabled: true
service:
type: LoadBalancer
nodePort:
https: 30076 # pinned node port across deploys
annotations: |
service.beta.kubernetes.io/azure-dns-label-name: <DNS_NAME>
Client values.yaml
global:
enabled: false
name: aks01
datacenter: mgmt
tls:
enabled: true
enableAutoEncrypt: true
caCert:
secretName: consul-ca-cert
secretKey: tls.crt
acls:
manageSystemACLs: true
bootstrapToken:
secretName: consul-bootstrap-acl-token
secretKey: token
gossipEncryption:
secretName: consul-gossip-encryption-key
secretKey: key
connectInject:
enabled: true
externalServers:
enabled: true
# This should be any node IP of the first k8s cluster
hosts: ["10.121.16.4"]
# The node port of the UI's NodePort service
httpsPort: 30076
# This should reflect the datacenter's name as that is what will be in the TLS cert
tlsServerName: server.mgmt.consul
# The address of the kube API server of this Kubernetes cluster
k8sAuthMethodHost: <API_SERVER_ENDPOINT>
client:
enabled: true
# Requires a kubeconfig for the datacenter cluster
# Permissions in the kubeconfig should be limited to just read pods (possibly limiting to just the 'consul' namespace)
# join: ["provider=k8s kubeconfig=/consul/userconfig/mgmt-kubeconfig/kubeconfig label_selector=\"app=consul,component=server\""]
# Join directly to Consul server pod IP
join: ["10.121.16.24"]
extraVolumes:
- type: secret
name: mgmt-kubeconfig
load: false
extraConfig: |
{
"log_level": "DEBUG"
}
So the auto-encrypt mention in the error message is a red herring. The real problem is that the k8s provider is not finding the server pods.
I encountered a similar issue today while using GKE with the same guide. It produced a slightly different error, but it does seem that the k8s provider is broken in one or more ways.
My issue was resolved in the same way: by specifying the IP of a Consul server pod explicitly.
[ERROR] failed to resolve go-discover auto-config servers: configuration="provider=k8s kubeconfig=/consul/userconfig/cluster1-kubeconfig/kubeconfig label_selector="app=consul,component=server"" err="discover-k8s: error listing pods: Get "https://<redacted>/api/v1/namespaces/default/pods?labelSelector=app%3Dconsul%2Ccomponent%3Dserver": getting credentials: exec: fork/exec /opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin/gke-gcloud-auth-plugin: no such file or directory"
[ERROR] agent.auto_config: no auto-encrypt server addresses available for use
Note edit: It turns out this is because gcloud uses an auth plugin instead of including static credentials in the kubecontext. Something to keep in mind but not a bug with the k8s provider AFAICT.
Overview of the Issue
I am following the guide at https://www.consul.io/docs/k8s/deployment-configurations/single-dc-multi-k8s to configure a single Consul DC that will have clients join from multiple K8s clusters. The guide covers how to deploy Consul server nodes into one K8s cluster, then have clients deployed into a 2nd K8s cluster connect to the servers via cloud auto-join.
When the client starts up in the secondary cluster, it shows the following output:
Complete log fragment is included below.
This error message could be due to auto-encrypt, which the guide suggest to enable. This error message is found in
auto_encrypt.go
(https://github.com/hashicorp/consul/blob/v1.13.1/agent/auto-config/auto_encrypt.go#L85-L106). My reading of that code is that the error is raised if there are nostart_join
addresses provided. The Helm chart is specifying the-retry-join
parameter to the Consul agent rather than-start-join
.The following is the exec statement for the Consul agent container:
Consul info for both Client and Server
Client info
``` / $ consul info Error querying agent: Get "https://localhost:8501/v1/agent/self": dial tcp [::1]:8501: connect: connection refused ```Server info
``` / $ consul info -tokenOperating system and Environment details
Kubernetes flavour: AKS Kubernetes version: 1.22.11 Deployment method: Helm Chart version: 0.47.1 Consul version: 1.13.1
Log Fragments
Error on the server:
Error on the client: