hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 323 forks source link

`-auth-method-host` logic in `server-acl-init` job spec fails under specific circumstances #1183

Open thomashashi opened 2 years ago

thomashashi commented 2 years ago

https://github.com/hashicorp/consul-k8s/blob/62bd9b4204591ddc2fb52ae5c2d96ccb091d64ce/charts/consul/templates/server-acl-init-job.yaml#L191-L200

Here's the scenario: there is a "server" Kubernetes cluster, in which the Consul Helm chart has been deployed, with the option to run the Consul server cluster in there for this datacenter. In the same datacenter is a "client" Kubernetes cluster, which has the Consul Helm chart deployed, but configured to be serverless and instead to connect an external Consul cluster, in this case, the one deployed in the "server" Kubernetes cluster.

If you have a Helm values yaml which looks like this:

global:
  name: consul-cluster-2
  enabled: false
client:
  enabled: true
  exposeGossipPorts: true
  join: 
    - "<address of Consul server LB>:9301"
externalServers:
  enabled: true
  httpsPort: 8501
  hosts: 
    - "<address of Consul server LB>"
  k8sAuthMethodHost: "<this Kubernetes cluster public K8s API server>

The Helm chart still installs the consul-cluster-2-client DaemonSet, and those pods all run the client-acl-init Init container, which runs /bin/sh -ec consul-k8s-control-plane acl-init \ -component-name=client \ -acl-auth-method="consul-cluster-2-k8s-component-auth-method" ...

However: when the server-acl-init job runs, Helm does not set the -auth-method-host flag, so on the Consul server cluster the consul-cluster-2-k8s-component-auth-method has in its configuration "Host": https://kubernetes.default.svc, the default value, which is not the Kubernetes API server which can actually validate K8s tokens from the "client" Kubernetes cluster. So when the client DaemonSet in the "client" Kubernetes clusters start up and run their client-acl-init init container, they get an error back from the Consul server cluster.

nabadger commented 11 months ago

Pretty sure i'm hitting htis issue now too. Any workaround?