hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 322 forks source link

Anonymous token loses policies during big upgrades #3007

Open bnycohoe opened 1 year ago

bnycohoe commented 1 year ago

Community Note


Overview of the Issue

An upgrade of Consul-k8s 0.26.0 to 1.1.6 in my primary datacenter caused the anonymous token to lose the custom policy we had linked to it (in our case called Anonymous). After the upgrade, the only token policy linked to the anonymous token was the anonymous-token-policy created by the server-acl-init process. This caused an outage for certain customers of ours because our tooling relies on certain anonymous privileges for KV reads that we had granted to the anonymous token via our policy.

Upgrades to a deployment that already contain an anonymous-token-policy will skip altering the token policies as of consul-k8s 1.1.4 thanks to an existence check added in https://github.com/hashicorp/consul-k8s/pull/2790. Based on the code in https://github.com/hashicorp/consul-k8s/blob/2feff9f2cb36f4ee818874c4d657ede2acbc074a/control-plane/subcommand/server-acl-init/anonymous_token.go#L49 any policies linked to the Anonymous Token will not be persisted through an upgrade, replaced only with the managed policy if the managed token policy does not already exist.

I believe this is undesirable behavior because user configuration data is thrown away (the linked policies they had configured prior to upgrade). Note that the policies themselves will still exist, and re-linking them is trival to accomplish, but it requires manual intervention.

Reproduction Steps

Upgrade consul-k8s from a very old version (something <~0.49.0) to a new version (such as >=1.1.6) in a primary datacenter.

Logs

Logs from the run of my server-acl-init are not available.

Expected behavior

Any user-defined token policies that are linked to well-known tokens (specifically the anonymous token) should remain linked through an upgrade.

Environment details

Old Consul-K8s version: 0.26.0 New Consul-K8s version: 1.1.6 Kubernetes version: 1.27.3 Consul Server version: 1.15.2-ent Relevant values:

global:
  acls:
    manageSystemACLs: true

Additional Context

In the 1.2.1 breaking changes there is mention that all policies managed by consul-k8s will now be updated on upgrade. This is not true after the implementation of https://github.com/hashicorp/consul-k8s/pull/2790. An existing anonymous-token-policy will not be updated on upgrade. Notes in the documentation should reflect this.

Also the GH-2790 improvement notes in the changelog do not appear in any 1.1.x minor release. They exist only for 1.0.10 and 1.2.1, neither of which are minor versions that would have applied to me. 1.1.4 appears to be the first version with the backported change. Knowing there was a change in behavior with 1.1.4+ would have likely led to quicker resolution of my original problem.

natemollica-nm commented 1 year ago

Can confirm reproducibility of this issue exists. Reproduced this on consul v1.13.2+ent | consul-k8s v0.49.0 -> v1.13.9+ent | v0.49.8 -> v1.14.7 | v1.0.5.

Observation of server-acl-init logs will show the policy being updated as:

# server-acl-init logs
2023-09-25T20:38:04.308Z [INFO]  Policy "anonymous-token-policy" already exists, updating
2023-09-25T20:38:04.401Z [INFO]  Success: creating anonymous token policy - PUT /v1/acl/policy
2023-09-25T20:38:04.403Z [INFO]  Success: updating anonymous token with policy