hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
667 stars 317 forks source link

API Gateway controller ACL init is broken in v0.48.0 #1479

Closed manobi closed 2 years ago

manobi commented 2 years ago

Community Note


Overview of the Issue

v0.48.0 uses k8s-auth when in secondary datacenters [GH-1462](by @nathancoleman), but after this upgrade API Gateway controller acl-init never finishes.

As mentioned in original issue, the consul-api-gateway-controller service account does not seems to have enough permission to perform authentication:

2022-09-02T15:51:21.019Z [ERROR] unable to login: error="Unexpected response code: 403 (rpc error making call: rpc error making call: rpc error making call: Permission denied)"

I've managed to run the following command in controller-acl-init but not in api-gateway-controller-acl-init container:

consul-k8s-control-plane acl-init \ 
            -component-name=api-gateway-controller \    
            -acl-auth-method=consul-consul-k8s-component-auth-method-REDACTED \ 
            -primary-datacenter=REDACTED \  
            -consul-api-timeout=1m \    
            -log-level=info \   
            -log-json=false

I have also been able to complete the initContainer using the "consul-controller" service account instead of "consul-api-gateway-controller".

But right now the Helm chart is broken and I have to keep api gateway disabled to keep using it.

Reproduction Steps

Logs

Expected behavior

Consul api-gateway-controller service account is expected to have authorization to run api gateway acl init.

Environment details

Additional Context

nathancoleman commented 2 years ago

Hi @manobi , looking into this

manobi commented 2 years ago

@nathancoleman if there is something I can do by editing the Helm release, just tell me and I can try to help you debug. Thank you.

nathancoleman commented 2 years ago

@manobi I'm working on validating the change in https://github.com/hashicorp/consul-k8s/pull/1481 which I believe should fix this issue

nathancoleman commented 2 years ago

@manobi The fix that I linked above allows the acl-init job to complete for the API Gateway controller successfully when following the Federation Between Kubernetes Clusters guide; however, there are other issues beyond that one which prevent the controller-per-cluster setup described in https://github.com/hashicorp/consul-api-gateway/issues/300 from working. Does the setup described there match what you're wanting to do?

manobi commented 2 years ago

@nathancoleman My setup is based on Federation Between Kubernetes Clusters guide.

Having a single API gateway for all clusters is not a requirement for me. I only need the API gateway working in the secondary cluster, routing requests for services running in secondary cluster (unlike https://github.com/hashicorp/consul-api-gateway/issues/300).

codex70 commented 2 years ago

@nathancoleman, whilst having a single API gateway would be very useful for me, it's not a definite requirement. At the moment I cannot get either option to work.

Ideally I'd like to be able to expose each service one one API gateway, but also separate API gateways, depending on the use for the gateway (for example client visibility etc.)

Also, a single datacenter doesn't really work due to the requirement for communication between pods in different clusters. It is important that the networks are kept separate.

Please keep me updated, currently I don't have a good alternative solution.

nathancoleman commented 2 years ago

@codex70 please see https://github.com/hashicorp/consul-k8s/issues/1344#issuecomment-1246987277