Open pedrohdz opened 3 years ago
Hi, do you have the Helm yamls for each DC? I think there is indeed likely to be an issue here.
@lkysow I should be able to get the Helm value files some time next week. Thanks…
@lkysow, The Helm Value files are in the following Gists:
Recreation instructions:
Create DC1 and extract configuration data:
kubectl --context=dc1 create namespace consul
kubectl --context=dc1 --namespace=consul \
create secret generic consul-gossip-encryption-key \
--from-literal=key=$(consul keygen)
wget 'https://gist.githubusercontent.com/pedrohdz/3e869f5b6dbfc3b49900c244ae67824e/raw/hashicorp+consul-k8s+issues+582+dc1-helm-values.yaml'
helm --kube-context=dc1 --namespace=consul \
install --values='hashicorp+consul-k8s+issues+582+dc1-helm-values.yaml' \
consul hashicorp/consul --version="0.33.0" --wait
kubectl --context=dc1 --namespace=consul \
get secret consul-federation -o yaml > consul-federation-secret.yaml
DEMO_CONSUL_BOOTSTRAP_TOKEN=$(kubectl \
--context=dc1 --namespace=consul get secrets \
consul-bootstrap-acl-token -o jsonpath='{.data.token}' | base64 -d)
Create DC2:
kubectl --context=dc2 create namespace consul
kubectl \
--context=dc2 --namespace=consul \
apply --filename=consul-federation-secret.yaml
wget 'https://gist.githubusercontent.com/pedrohdz/df324f75315a789eed53d0f331cf1d44/raw/hashicorp+consul-k8s+issues+582+dc2-helm-values.yaml'
helm --kube-context=dc2 --namespace=consul \
install --values=hashicorp+consul-k8s+issues+582+dc2-helm-values.yaml \
consul hashicorp/consul --version="0.33.0" --wait
Verify that everything is working:
kubectl --context=dc1 --namespace=consul \
exec statefulset/consul-server -- consul members -wan
You are going to need to get the IP address of the server pod, assuming it has a network exposed IP address:
DEMO_CONSUL_DC2_IP=$(kubectl --context=dc2 \
--namespace=consul get pods consul-server-0 -o jsonpath='{.status.podIP}')
echo $DEMO_CONSUL_DC2_IP
Set up the DC2 client only cluster:
kubectl --context=dc2-client-only create namespace consul
kubectl --context=dc2-client-only --namespace=consul \
apply --filename=consul-federation-secret.yaml
kubectl --context=dc2-client-only --namespace=consul \
create secret generic copied-bootstrap-token \
--from-literal=token="$DEMO_CONSUL_BOOTSTRAP_TOKEN"
wget 'https://gist.githubusercontent.com/pedrohdz/a5f59554464e7e8d60bcf2a34bab40d9/raw/hashicorp+consul-k8s+issues+582+dc2-client-only-helm-values.yaml'
helm --kube-context=dc2-client-only --namespace=consul \
install --values=hashicorp+consul-k8s+issues+582+dc2-client-only-helm-values.yaml \
--set="client.join[0]=$DEMO_CONSUL_DC2_IP" \
--set="externalServers.hosts[0]=$DEMO_CONSUL_DC2_IP" \
consul hashicorp/consul --version "0.33.0" --wait
View the client errors:
kubectl --context=dc2-client-only --namespace=consul logs -l 'app=consul,component=client'
Note that the error seems to have changed, but the root cause appears to be the same:
2021-08-25T07:09:08.401Z [WARN] agent: Node info update blocked by ACLs: node=YYYYYY-YYYYYYY-YYYYYYY accessorID=XXXXX-XXXXXXX-XXXXXXXX
2021-08-25T07:09:22.535Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.31.246.59:8300 error="rpc error making call: Permission denied"
2021-08-25T07:09:22.535Z [WARN] agent: Coordinate update blocked by ACLs: accessorID=XXXXX-XXXXXXX-XXXXXXXX
2021-08-25T07:09:46.500Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.31.246.59:8300 error="rpc error making call: Permission denied"
2021-08-25T07:09:46.504Z [WARN] agent: Coordinate update blocked by ACLs: accessorID=XXXXX-XXXXXXX-XXXXXXXX
2021-08-25T07:10:03.211Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.31.246.59:8300 error="rpc error making call: Permission denied"
2021-08-25T07:10:03.211Z [WARN] agent: Coordinate update blocked by ACLs: accessorID=XXXXX-XXXXXXX-XXXXXXXX
If you add the client-token-dc2
policy to the client-token Token
on dc2,
the errors go away and it seems as if the nodes finish registering properly.
2021-08-25T09:23:01.502Z [INFO] agent: Synced node info
2021-08-25T09:23:09.140Z [INFO] agent: Synced node info
2021-08-25T09:23:36.600Z [INFO] agent: Synced node info
Okay so I think this is covered in this doc: https://www.consul.io/docs/k8s/installation/deployment-configurations/single-dc-multi-k8s
Note: The Helm release name must be unique for each Kubernetes cluster. That is because the Helm chart will use the Helm release name as a prefix for the ACL resources that it creates, such as tokens and auth methods. If the names of the Helm releases are the same, the Helm installation in subsequent clusters will clobber existing ACL resources.
I think if you use a different prefix in your hashicorp+consul-k8s+issues+582+dc2-client-only-helm-values.yaml
, e.g. global.name: consul-clientonly
then it will work.
@lkysow, No dice, unfortunately..
The consul-k8s-control-plane server-acl-init
call on the consul client cluster is still defaulting to creating a token with the description of client-token Token
and associating it with the client-token
policy, which is set for dc1
only, bot dc2
.
It looks like the name of the token is hard-coded here: https://github.com/hashicorp/consul-k8s/blob/01d22a21cfc03960b29d97191ba1acebed5ede60/control-plane/subcommand/server-acl-init/command.go#L444-L448
Then this part fails to append the DC, which would associate the token with the client-token-dc2
policy:
https://github.com/hashicorp/consul-k8s/blob/01d22a21cfc03960b29d97191ba1acebed5ede60/control-plane/subcommand/server-acl-init/create_or_update.go#L34-L38
I guess an option might be to utilize the -resource-prefix
instead of hard-coding the name client
. I'm not sure if that would break existing deployments though. Another is appending the DC to the policy name, which would likely be a better solution since it would minimize the number of policies being auto-created. The other option is that I'm totally missing something else. 🤓
I updated the Gists BTW.
@lkysow,
Question for you.. Should the global.name
be set to different values for the two DCs (servers)? I currently have them set the same in the YAML files I provided.
Thanks!
Ahhh I see what's happening. Yeah the client-only dc2 thinks it's not in federation mode because global.federation.enabled == false
. Can you try setting that to true in the client-only dc2?
Question for you.. Should the global.name be set to different values for the two DCs (servers)? I currently have them set the same in the YAML files I provided.
No, those can be the same name. The restriction for different names is only when you're sharing a Consul DC across two kube clusters.
Huh... That seemed to do the trick (Gist updated), although the following language is a little confusing in this case:
One would imply that meshGateway.enabled
must be true, which it is not. Or maybe I should be turning it on? I thought the mesh-gateways were only utilized when communicating between DC servers. I tried a release with meshGateway.enabled
set to true
and it seems to come up just fine.
Hi, yes I agree it's confusing. Really what you're indicating by setting that in your client-only install values is that you're in a secondary DC. I think we might be able to get away without doing that check and then it should just work. I'll create a PR.
Community Note
Overview of the Issue
A new Consul client only Kubernetes cluster is failing to join a secondary DC. We are utilizing
consul-helm
with.Values.global.acls.manageSystemACLs
enabled to deploy. The cause appears to be that theclient-token
ACL token is being associated with theclient-token
ACL policy of the primary DC, not the secondary.The consul client logs are showing (
kubectl logs consul-k7xm9
):The
client-token
ACL policy (associated with the primary DC) is being utilized when creating a client token for a new Kubernetes client cluster that is configured to use the secondary DC as its server.Partial output from
kubectl logs consul-server-acl-init-5dhnn
shows thatclient-token
policy is being utilized, notclient-token-REDACTED-dc2
Listing the ACL policies on
REDACTED-dc2
shows thatclient-token-REDACTED-dc2
exists.REDACTED-dc2
is only being utilized by the Consul client running on dc2 Kubernetes cluster itself.Taking a guess here, it seems like the issue is with the following, where DC name is not being appended to the policy name:
https://github.com/hashicorp/consul-k8s/blob/4a50fda5ab50fb2d6c99603b00a538ef432a6eed/subcommand/server-acl-init/create_or_update.go#L34-L38
Reproduction Steps
The general idea is:
consul-helm
Helm chart. similar to the Secure Service Mesh Communication Across Kubernetes Clusters tututial withmanageSystemACLs
and TLSenableAutoEncrypt
.server.enabled: false
, pointing it to the secondary DC.Logs
Logs provided above.
Expected behavior
consul-k8s server-acl-init -create-client-token
should be associating theclient-token
ACL token with theclient-token-REDACTED-dc2
ACL policy for the secondary DC, not the one for the primary DC.Environment details
If not already included, please provide the following:
consul-k8s
version:hashicorp/consul-k8s:0.26.0
consul-helm
version:consul-0.32.1