hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
668 stars 321 forks source link

Kubernetes services external access via HAproxy and Consul #4380

Open allidoiswin10 opened 1 week ago

allidoiswin10 commented 1 week ago

Hi All,

I've been investigating consul for service discovery, we want to use it for services deployed in Kubernetes (on-prem clusters deployed via kubespray and kubeadm) as well as services that live on bare metal VMs. I'll detail our cluster setup and what I've configured thus far.

TLDR - HAproxy LB point to HAproxy ingress controller nodes on multiple clusters. Routed via host headers with ingress objects using path prefixes. Want to use consul purely for service discovery. Configured with consul templates to loop through services and map them to the respective ingress controller nodes.

Traffic flows into our cluster via an external load balancer (LB), HAproxy in our case. We have Polaris GSLB as an authoritative DNS server for the sub domain .dev.company.com. The top level domain .company.com is configured in AD DNS and handled by another tech department. Polaris has records for all the clusters (prod-cluster-1.dev.company.com, prod-cluster-2.dev.company.com, etc) and some independent services (app.dev.company.com, app2.dev.company.com, etc) that all just point back to the external HAproxy load balancer. Once traffic gets to the load balancer, we have config that maps host headers to backends.

With introduction of Consul, I've deployed consul server on a Linux VM with the following configuration:

server = true
bootstrap_expect = 1
bind_addr = "<IP>"
client_addr = "<IP>"
ui_config {
  enabled = true
}
ports {
  grpc = 8502
  grpc_tls = -1
}

The consul.hcl is also very standard:

datacenter = "dc1"
data_dir = "/opt/consul"
encrypt = "<KEY>"
tls {
   defaults {
      ca_file = "/etc/consul.d/certs/consul-agent-ca.pem"
      cert_file = "/etc/consul.d/certs/dc1-server-consul-0.pem"
      key_file = "/etc/consul.d/certs/dc1-server-consul-0-key.pem"

      verify_incoming = false
      verify_outgoing = true
   }
   internal_rpc {
      verify_server_hostname = false
   }
}
retry_join = ["<IP>"]

Consul-k8s, I've deployed the catalog sync service (currently saving all services):

global:
  enabled: false
  gossipEncryption:
    autoGenerate: false
    secretName: consul-gossip-encryption-key
    secretKey: key
  tls:
    caCert:
      secretName: consul-ca
      secretKey: tls.crt

server:
  enabled: false

externalServers:
  enabled: true
  hosts: [<EXTERNAL CONSUL SERVER>]
  httpsPort: 8500

syncCatalog:
  enabled: true
  toK8S: false
  k8sTag: <k8s cluster name>
  consulNodeName: <k8s cluster name>
  ingress:
    enabled: true

connectInject:
  enabled: false

Once the catalog sync on consul-k8s starts syncing services, I used consul-template on haproxy to essentially map the services to the ingress NodePort services that have the same cluster tag:

{{range services -}}{{$servicename := .Name}}
backend b_{{$servicename}}.{{ .Tags | join "," }}.dev.example.com
  mode http
  {{range service "haproxy-ingress-haproxy-ingress"}}
  server {{ .Address }} {{ .Address }}:{{ .Port }} ssl verify check-ssl
  {{end}}
{{- end}}

So all of this achieves a list of services we want discoverable in Consul and we have HAproxy LB getting all the services, mapping the controller ingress controller nodes and ports against them.

Enabling the ingress option on consul-k8s is great but I've noticed it only exposes one of the hostnames of an ingress object. Ideally with a multiple cluster setup, we would want services accessible via friendly names like app.dev.bhdgsystematic.com but also accessible via app.dc1.dev.bhdgsystematic.com. Most of the chatter online seems to be using consul-dns and then using the .consul domain for queries. I don't particularly like this approach, I don't want to introduce another arbitrary domain into our setup.

I've yet to see many others use Consul and Kubernetes in this way. Is what we're doing wrong or possibly incorrect. How are others using consul to expose services and what other tooling is used to get traffic to these services for on-prem clusters?

Please let me know if I've missed out any details.