projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

Missing RBAC permission in charts/calico to create default Tiers causing typha to not start #9442

Closed sfudeus closed 1 week ago

sfudeus commented 1 week ago

3.29.0 introduced the new policy tiers. According to the docs, the default tier and the AdminNetworkPolicy tier are supposed to be auto-created. calico-kube-controllers even has the permissions to do so and seemingly has created these.

I'm aware that using https://github.com/projectcalico/calico/tree/master/charts/calico is not a supported means for installing and maintaining Calico, but it causes incomplete manifests even just for documentation.

Expected Behavior

The new tiers are auto-created by a component which has the permissions to do so, the others just consume them.

Current Behavior

Seemingly calico-kube-controllers does create the new tier objects. But typha as well tries to, and fails with permission issues. This seems to be caused by the fact the existence is checked by creation and reaction to an ErrorResourceAlreadyExists error (see client.go#408 and 429). This causes typha to fail as long as it hasn't create permissions.

2024-11-05 10:07:45.616 [ERROR][1] typha/daemon.go 318: Failed to initialize datastore error=connection is unauthorized: tiers.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "tiers" in API group "crd.projectcalico.org" at the cluster scope
2024-11-05 10:07:46.623 [INFO][1] typha/client.go 253: Unable to initialize default Tier error=connection is unauthorized: tiers.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "tiers" in API group "crd.projectcalico.org" at the cluster scope
2024-11-05 10:07:46.625 [INFO][1] typha/client.go 258: Unable to initialize adminnetworkpolicy Tier error=connection is unauthorized: tiers.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "tiers" in API group "crd.projectcalico.org" at the cluster scope

In charts/calico, Typha is set up running with serviceAccountName: calico-node. calico-node is only set up with read,list,watch permissions for tiers. (https://github.com/projectcalico/calico/blob/master/charts/calico/templates/calico-node-rbac.yaml#L113). Likely this makes sense, to not allow calico-node to manipulate/create tiers in normal operations, but if they do have logic to recreate them on demand/on migration, they need the permissions to do so - or the logic to migrate must be adapted to cope with not having permission.

Possible Solution

Either

Steps to Reproduce

  1. setup calico with rbac as done in charts/calico (or described in pre-rendered manifests
  2. deploy calico-kube-controllers, which will create the default tiers
  3. deploy typha - which will not start up, complaining about not being authorized to create default tiers

Your Environment

mazdakn commented 1 week ago

@sfudeus Thanks for reporting the issue. The initialisation logic is here: https://github.com/projectcalico/calico/blob/07ad564f962be48c14c38abfaa159319770bda6b/libcalico-go/lib/clientv3/client.go#L236 The method is supposed to be best effort, and is called from multiple places. As such, if a cluster is already initialised, the subsequent calls must not lead to error. It seems not giving Typha the create permission to create tiers, causes it to not to startup, which is a bug.

I believe your 3rd suggestion, is the best approach. This will be fixed in the next patch release.

mazdakn commented 1 week ago

This PR should fix the issue: https://github.com/projectcalico/calico/pull/9446

mazdakn commented 1 week ago

Closing since the fix and its back port is now merged. The issue should be fixed in v3.29.1 patch release.