I try to use mesh gateway to connect two consul cluster, one is in VMs as an Primary Datacenter and another one in Kubernetes as an Secondary Datacenter.
The problem was in k8s, pods status are fine, like that:
but there are errors on consul-server pods,like that:
kubectl logs -f consul-server-0 --tail=1
2021-12-09T10:23:19.959Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50346 latency=34.011µs
2021-12-09T10:23:21.620Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:21.621Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:22.318Z [DEBUG] agent.server: federation states are not enabled in the primary dc
2021-12-09T10:23:22.957Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50388 latency=38.025µs
2021-12-09T10:23:24.364Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:25.328Z [WARN] agent.router: Non-server in server-only area: non_server=cn-hangzhou.10.111.223.220 area=lan
2021-12-09T10:23:25.328Z [WARN] agent.router: Non-server in server-only area: non_server=cn-hangzhou.10.111.223.219 area=lan
2021-12-09T10:23:25.328Z [WARN] agent.router: Non-server in server-only area: non_server=cn-hangzhou.10.111.223.218 area=lan
2021-12-09T10:23:25.827Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:25.960Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50428 latency=92.662µs
2021-12-09T10:23:26.300Z [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: cn-hangzhou.10.111.223.220 10.111.223.220:8301
2021-12-09T10:23:26.641Z [INFO] agent: (WAN) joining: wan_addresses=[*.dev-consul-connect/192.0.2.2]
2021-12-09T10:23:26.642Z [DEBUG] agent.server.memberlist.wan: memberlist: Failed to join 192.0.2.2: read tcp 10.200.0.204:43490->19.112.1.138:10010: read: connection reset by peer
2021-12-09T10:23:26.642Z [WARN] agent: (WAN) couldn't join: number_of_nodes=0 error="1 error occurred:
* Failed to join 192.0.2.2: read tcp 10.200.0.204:43490->19.112.1.138:10010: read: connection reset by peer
"
2021-12-09T10:23:26.642Z [WARN] agent: Join cluster failed, will retry: cluster=WAN retry_interval=30s error=<nil>
2021-12-09T10:23:27.096Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:28.318Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:28.319Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:28.957Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50476 latency=31.193µs
2021-12-09T10:23:29.836Z [DEBUG] agent.server: federation states are not enabled in the primary dc
2021-12-09T10:23:30.452Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:30.838Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.111.223.220:29081
2021-12-09T10:23:31.258Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:31.962Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50504 latency=42.942µs
2021-12-09T10:23:33.645Z [DEBUG] agent.server: federation states are not enabled in the primary dc
2021-12-09T10:23:34.905Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:34.957Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50554 latency=34.466µs
2021-12-09T10:23:35.544Z [DEBUG] agent.server: federation states are not enabled in the primary dc
2021-12-09T10:23:37.130Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:37.959Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50590 latency=28.079µs
2021-12-09T10:23:38.058Z [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=10.200.0.37:38926
2021-12-09T10:23:38.656Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:39.661Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:40.959Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50636 latency=38.093µs
2021-12-09T10:23:41.812Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
2021-12-09T10:23:42.427Z [DEBUG] agent.server: federation states are not enabled in the primary dc
2021-12-09T10:23:43.957Z [DEBUG] agent.http: Request finished: method=GET url=/v1/status/leader from=127.0.0.1:50672 latency=36.955µs
2021-12-09T10:23:44.068Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=dev-consul-connect method=Health.ServiceNodes
I have no idea how to solve it, and will be graceful for your suggesttion.
global:
# The main enabled/disabled setting. If true, servers,
# clients, Consul DNS and the Consul UI will be enabled. Each component can override
# this default via its component-specific "enabled" config. If false, no components
# will be installed by default and per-component opt-in is required, such as by
# setting `server.enabled` to true.
enabled: true
# Set the prefix used for all resources in the Helm chart. If not set,
# the prefix will be `<helm release name>-consul`.
# @type: string
name: null
# The domain Consul will answer DNS queries for
# (see `-domain` (https://consul.io/docs/agent/options#_domain)) and the domain services synced from
# Consul into Kubernetes will have, e.g. `service-name.service.consul`.
domain: consul
# The name (and tag) of the Consul Docker image for clients and servers.
# This can be overridden per component. This should be pinned to a specific
# version tag, otherwise you may inadvertently upgrade your Consul version.
#
# Examples:
#
# ```yaml
# # Consul 1.5.0
# image: "consul:1.5.0"
# # Consul Enterprise 1.5.0
# image: "hashicorp/consul-enterprise:1.5.0-ent"
# ```
image: "consul:1.8.4"
# Array of objects containing image pull secret names that will be applied to each service account.
# This can be used to reference image pull secrets if using a custom consul or consul-k8s Docker image.
# See https://kubernetes.io/docs/concepts/containers/images/#using-a-private-registry for reference.
#
# Example:
#
# ```yaml
# imagePullSecrets:
# - name: pull-secret-name
# - name: pull-secret-name-2
# ```
# @type: array<map>
imagePullSecrets: []
# The name (and tag) of the consul-k8s (https://github.com/hashicorp/consul-k8s)
# Docker image that is used for functionality such the catalog sync.
# This can be overridden per component.
imageK8S: "hashicorp/consul-k8s:0.22.0"
# The name of the datacenter that the agents should
# register as. This can't be changed once the Consul cluster is up and running
# since Consul doesn't support an automatic way to change this value currently:
# https://github.com/hashicorp/consul/issues/1858.
datacenter: dev-consul-connect-dc2
# Controls whether pod security policies are created for the Consul components
# created by this chart. See https://kubernetes.io/docs/concepts/policy/pod-security-policy/.
enablePodSecurityPolicies: false
# Configures which Kubernetes secret to retrieve Consul's
# gossip encryption key from (see `-encrypt` (https://consul.io/docs/agent/options#_encrypt)). If secretName or
# secretKey are not set, gossip encryption will not be enabled. The secret must
# be in the same namespace that Consul is installed into.
#
# The secret can be created by running:
#
# ```shell
# $ kubectl create secret generic consul-gossip-encryption-key --from-literal=key=$(consul keygen)
# ```
#
# To reference, use:
#
# ```yaml
# global:
# gossipEncryption:
# secretName: consul-gossip-encryption-key
# secretKey: key
# ```
gossipEncryption:
# secretName is the name of the Kubernetes secret that holds the gossip
# encryption key. The secret must be in the same namespace that Consul is installed into.
secretName: ""
# secretKey is the key within the Kubernetes secret that holds the gossip
# encryption key.
secretKey: ""
# Enables TLS (https://learn.hashicorp.com/tutorials/consul/tls-encryption-secure)
# across the cluster to verify authenticity of the Consul servers and clients.
# Requires Consul v1.4.1+ and consul-k8s v0.16.2+
tls:
# If true, the Helm chart will enable TLS for Consul
# servers and clients and all consul-k8s components, as well as generate certificate
# authority (optional) and server and client certificates.
enabled: true
# If true, turns on the auto-encrypt feature on clients and servers.
# It also switches consul-k8s components to retrieve the CA from the servers
# via the API. Requires Consul 1.7.1+ and consul-k8s 0.13.0
enableAutoEncrypt: false
# A list of additional DNS names to set as Subject Alternative Names (SANs)
# in the server certificate. This is useful when you need to access the
# Consul server(s) externally, for example, if you're using the UI.
# @type: array<string>
serverAdditionalDNSSANs: []
# A list of additional IP addresses to set as Subject Alternative Names (SANs)
# in the server certificate. This is useful when you need to access the
# Consul server(s) externally, for example, if you're using the UI.
# @type: array<string>
serverAdditionalIPSANs: []
# If true, `verify_outgoing`, `verify_server_hostname`,
# and `verify_incoming_rpc` will be set to `true` for Consul servers and clients.
# Set this to false to incrementally roll out TLS on an existing Consul cluster.
# Please see https://consul.io/docs/k8s/operations/tls-on-existing-cluster
# for more details.
verify: false
# If true, the Helm chart will configure Consul to disable the HTTP port on
# both clients and servers and to only accept HTTPS connections.
httpsOnly: false
# A Kubernetes secret containing the certificate of the CA to use for
# TLS communication within the Consul cluster. If you have generated the CA yourself
# with the consul CLI, you could use the following command to create the secret
# in Kubernetes:
#
# ```bash
# kubectl create secret generic consul-ca-cert \
# --from-file='tls.crt=./consul-agent-ca.pem'
# ```
caCert:
# The name of the Kubernetes secret.
secretName: consul-federation
# The key of the Kubernetes secret.
secretKey: caCert
# A Kubernetes secret containing the private key of the CA to use for
# TLS communication within the Consul cluster. If you have generated the CA yourself
# with the consul CLI, you could use the following command to create the secret
# in Kubernetes:
#
# ```bash
# kubectl create secret generic consul-ca-key \
# --from-file='tls.key=./consul-agent-ca-key.pem'
# ```
#
# Note that we need the CA key so that we can generate server and client certificates.
# It is particularly important for the client certificates since they need to have host IPs
# as Subject Alternative Names. In the future, we may support bringing your own server
# certificates.
caKey:
# The name of the Kubernetes secret.
secretName: consul-federation
# The key of the Kubernetes secret.
secretKey: caKey
# [Enterprise Only] `enableConsulNamespaces` indicates that you are running
# Consul Enterprise v1.7+ with a valid Consul Enterprise license and would
# like to make use of configuration beyond registering everything into
# the `default` Consul namespace. Requires consul-k8s v0.12+. Additional configuration
# options are found in the `consulNamespaces` section of both the catalog sync
# and connect injector.
enableConsulNamespaces: false
# Configure ACLs.
acls:
manageSystemACLs: false
bootstrapToken:
# The name of the Kubernetes secret.
secretName: null
# The key of the Kubernetes secret.
secretKey: null
createReplicationToken: false
replicationToken:
# The name of the Kubernetes secret.
secretName: null
# The key of the Kubernetes secret.
secretKey: null
federation:
enabled: true
createFederationSecret: false
lifecycleSidecarContainer:
resources:
requests:
memory: "25Mi"
cpu: "20m"
limits:
memory: "50Mi"
cpu: "20m"
imageEnvoy: "envoyproxy/envoy-alpine:v1.14.7"
# Configuration for running this Helm chart on the Red Hat OpenShift platform.
# This Helm chart currently supports OpenShift v4.x+.
openshift:
# If true, the Helm chart will create necessary configuration for running
# its components on OpenShift.
enabled: false
server:
enabled: true
image: "consul:1.8.4"
replicas: 2
bootstrapExpect: 2
enterpriseLicense:
secretKey: null
exposeGossipAndRPCPorts: false
# Configures ports for the consul servers.
ports:
serflan:
port: 8301
storage: 20Gi
storageClass: alicloud-disk-ssd
connect: true
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "100Mi"
cpu: "100m"
securityContext:
runAsNonRoot: true
runAsGroup: 1000
runAsUser: 100
fsGroup: 1000
updatePartition: 0
disruptionBudget:
enabled: true
maxUnavailable: null
extraConfig: |
{
"log_level":"DEBUG",
"primary_datacenter":"dev-consul-connect",
"primary_gateways":["19.112.1.138:10010","53.26.1.119:10010"]
}
extraVolumes: []
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: {{ template "consul.name" . }}
release: "{{ .Release.Name }}"
component: server
topologyKey: kubernetes.io/hostname
tolerations: ""
nodeSelector: null
priorityClassName: ""
extraLabels: null
annotations: null
service:
annotations: null
extraEnvironmentVars: {}
client:
enabled: true
image: null
join: null
dataDirectoryHostPath: null
grpc: true
exposeGossipPorts: false
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "100Mi"
cpu: "100m"
securityContext:
runAsNonRoot: true
runAsGroup: 1000
runAsUser: 100
fsGroup: 1000
extraConfig: |
{}
extraVolumes: []
tolerations: ""
nodeSelector: null
affinity: {}
priorityClassName: ""
annotations: null
extraEnvironmentVars: {}
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
updateStrategy: null
snapshotAgent:
# If true, the chart will install resources necessary to run the snapshot agent.
enabled: false
# The number of snapshot agents to run.
replicas: 2
configSecret:
# The name of the Kubernetes secret.
secretName: null
# The key of the Kubernetes secret.
secretKey: null
# Resource settings for snapshot agent pods.
resources:
requests:
memory: "50Mi"
cpu: "50m"
limits:
memory: "50Mi"
cpu: "50m"
caCert: null
dns:
# @type: boolean
enabled: false
type: ClusterIP
clusterIP: null
# Extra annotations to attach to the dns service
# This should be a multi-line string of
# annotations to apply to the dns Service
# @type: string
annotations: null
# Additional ServiceSpec values
# This should be a multi-line string mapping directly to a Kubernetes
# ServiceSpec object.
# @type: string
additionalSpec: null
syncCatalog:
# True if you want to enable the catalog sync. Set to "-" to inherit from
# global.enabled.
enabled: false
# The name of the Docker image (including any tag) for consul-k8s
# to run the sync program.
# @type: string
image: null
# If true, all valid services in K8S are
# synced by default. If false, the service must be annotated
# (https://consul.io/docs/k8s/service-sync#sync-enable-disable) properly to sync.
# In either case an annotation can override the default.
default: true
# Optional priorityClassName.
priorityClassName: ""
toConsul: true
toK8S: true
k8sPrefix: ""
k8sAllowNamespaces: ["test"]
k8sDenyNamespaces: ["kube-system", "kube-public"]
k8sSourceNamespace: null
consulNamespaces:
consulDestinationNamespace: "default"
mirroringK8S: true
mirroringK8SPrefix: "myconsul-"
addK8SNamespaceSuffix: true
consulPrefix: "fromk8s-"
g
k8sTag: null
consulNodeName: "k8s-sync"
syncClusterIPServices: true
nodePortSyncType: ExternalFirst
aclSyncToken:
# The name of the Kubernetes secret.
secretName: null
# The key of the Kubernetes secret.
secretKey: null
nodeSelector: null
# Affinity Settings
# This should be a multi-line string matching the affinity object
# @type: string
affinity: null
# Toleration Settings
# This should be a multi-line string matching the Toleration array
# in a PodSpec.
# @type: string
tolerations: null
# Resource settings for sync catalog pods.
resources:
requests:
memory: "50Mi"
cpu: "50m"
limits:
memory: "50Mi"
cpu: "50m"
# Log verbosity level. One of "trace", "debug", "info", "warn", or "error".
logLevel: debug
# Override the default interval to perform syncing operations creating Consul services.
# @type: string
consulWriteInterval: null
connectInject:
# True if you want to enable connect injection. Set to "-" to inherit from
# global.enabled.
enabled: true
# Image for consul-k8s that contains the injector
# @type: string
image: null
default: true
healthChecks:
enabled: true
# If `healthChecks.enabled` is set to `true`, `reconcilePeriod` defines how often a full state
# reconcile is done after the initial reconcile at startup is completed.
reconcilePeriod: "1m"
envoyExtraArgs: "-- -l off --component-log-level upstream:trace,http:trace,router:trace,config:debug "
# Optional priorityClassName.
priorityClassName: ""
# The Docker image for Consul to use when performing Connect injection.
# Defaults to global.image.
# @type: string
imageConsul: null
# Log verbosity level. One of "debug", "info", "warn", or "error".
logLevel: info
# Resource settings for connect inject pods.
resources:
requests:
memory: "50Mi"
cpu: "50m"
limits:
memory: "50Mi"
cpu: "50m"
namespaceSelector: null
k8sAllowNamespaces: ["test"]
k8sDenyNamespaces: ["mock"]
# [Enterprise Only] These settings manage the connect injector's interaction with
# Consul namespaces (requires consul-ent v1.7+ and consul-k8s v0.12+).
# Also, `global.enableConsulNamespaces` must be true.
consulNamespaces:
# Name of the Consul namespace to register all
# k8s pods into. If the Consul namespace does not already exist,
# it will be created. This will be ignored if `mirroringK8S` is true.
consulDestinationNamespace: "default"
# Causes k8s pods to be registered into a Consul namespace
# of the same name as their k8s namespace, optionally prefixed if
# `mirroringK8SPrefix` is set below. If the Consul namespace does not
# already exist, it will be created. Turning this on overrides the
# `consulDestinationNamespace` setting.
mirroringK8S: false
mirroringK8SPrefix: ""
certs:
secretName: null
caBundle: ""
# Name of the file within the secret for
# the TLS cert.
certName: tls.crt
# Name of the file within the secret for
# the private TLS key.
keyName: tls.key
nodeSelector: null
affinity: null
tolerations: null
aclBindingRuleSelector: "serviceaccount.name!=default"
overrideAuthMethodName: ""
aclInjectToken:
secretName: null
secretKey: null
# Requires Consul >= v1.5 and consul-k8s >= v0.8.1.
centralConfig:
enabled: true
defaultProtocol: http
proxyDefaults: |
{}
sidecarProxy:
resources:
requests:
# Recommended default: 100Mi
# @type: string
memory: null
# Recommended default: 100m
# @type: string
cpu: null
limits:
# Recommended default: 100Mi
# @type: string
memory: null
# Recommended default: 100m
# @type: string
cpu: null
# Resource settings for the Connect injected init container.
initContainer:
resources:
requests:
memory: "25Mi"
cpu: "50m"
limits:
memory: "150Mi"
cpu: "50m"
# Controller handles config entry custom resources.
# Requires consul >= 1.8.4.
# ServiceIntentions require consul 1.9+.
controller:
enabled: true
replicas: 1
# Log verbosity level. One of "debug", "info", "warn", or "error".
logLevel: debug
# Resource settings for controller pods.
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
# Optional YAML string to specify a nodeSelector config.
# @type: string
nodeSelector: null
# Optional YAML string to specify tolerations.
# @type: string
tolerations: null
# Affinity Settings
# This should be a multi-line string matching the affinity object
# @type: string
affinity: null
# Optional priorityClassName.
priorityClassName: ""
# Mesh Gateways enable Consul Connect to work across Consul datacenters.
meshGateway:
enabled: true
globalMode: local
# Number of replicas for the Deployment.
replicas: 1
# What gets registered as WAN address for the gateway.
wanAddress:
source: "NodeIP"
port: 443
static: ""
service:
enabled: true
type: LoadBalancer
port: 443
nodePort: null
annotations: null
additionalSpec: null
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
consulServiceName: "mesh-gateway"
containerPort: 8443
hostPort: null
# Resource settings for mesh gateway pods.
# NOTE: The use of a YAML string is deprecated. Instead, set directly as a
# YAML map.
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "100Mi"
cpu: "100m"
initCopyConsulContainer:
resources:
requests:
memory: "25Mi"
cpu: "50m"
limits:
memory: "150Mi"
cpu: "50m"
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: {{ template "consul.name" . }}
release: "{{ .Release.Name }}"
component: mesh-gateway
topologyKey: kubernetes.io/hostname
tolerations: null
nodeSelector: null
priorityClassName: ""
annotations: null
There is a few of things you could check to help with debugging of the issue:
Do you have network connectivity between the mesh gateway for the VMs dc and mesh gateway on k8s. It looks like you need to allow traffic between the node IP of the mesh gateway on k8s and the address of the primary gateway (looks like a public address).
Is your primary mesh gateway also have the mode set to local?
CA cert and key have to be the same for both datacenters.
Question
I try to use mesh gateway to connect two consul cluster, one is in VMs as an Primary Datacenter and another one in Kubernetes as an Secondary Datacenter. The problem was in k8s, pods status are fine, like that:
but there are errors on consul-server pods,like that:
I have no idea how to solve it, and will be graceful for your suggesttion.
CLI Commands (consul-k8s, consul-k8s-control-plane, helm)
Helm Configuration
the consul-helm values.yaml:
Logs
Current understanding and Expected behavior
Environment details
consul-helm: 0.28.0; consul: 1.8.4; envoy: 1.14.7; consul-k8s: 0.22
Additional Context