gtsystem / lightkube

Modern lightweight kubernetes module for python
https://lightkube.readthedocs.io
MIT License
96 stars 11 forks source link

`apply`ing to an aggregated clusterrole fails with 409 conflict on `.rules` #32

Closed ca-scribner closed 2 years ago

ca-scribner commented 2 years ago

Is there a way in lightkube to apply an aggregate cluster role that includes an empty rules: [] without forceing it? Applying to an aggregate clusterrole with anything in rules results in a 409 conflict because the control plane maintains the rules list. I'd rather avoid using force so I don't suppress other errors, but can't think of anything else apart from adding some custom logic before calling .apply to remove the rules entirely.

This python snippet demonstrates the issue, generating a 409 conflict on rules when a change is applied without force=True:

import time

import lightkube
from lightkube.codecs import load_all_yaml

# Note: Running this will leave two clusterroles (aggregate-clusterrole and aggregate-clusterrole)
# in your cluster.  Running it a second time will fail faster (on the first apply of
# aggregate-clusterrole, because the aggregate-clusterrole will already exist).

c = lightkube.Client(field_manager="myself")

aggregated_clusterrole_yaml = """
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aggregated-clusterrole
  labels:
    test.com/aggregate-to-view: "true"
rules: 
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
"""

aggregated_clusterrole = load_all_yaml(aggregated_clusterrole_yaml)[0]
c.apply(aggregated_clusterrole)

aggregate_clusterrole_yaml = """
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aggregate-clusterrole
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        test.com/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.
"""

aggregate_clusterrole = load_all_yaml(aggregate_clusterrole_yaml)[0]

# Creates aggregate_clusterrole
c.apply(aggregate_clusterrole)
# let the aggregation manager catch up.  if we don't do this, we can sometimes do the below applies before the aggregation is completed
time.sleep(1)  

# Force apply onto existing aggregate_clusterrole (works, as it ignores the 409 conflict error)
c.apply(aggregate_clusterrole, force=True)

# Applies onto existing aggregate_clusterrole (and fails due to a 409 conflict on .rules)
c.apply(aggregate_clusterrole)
ca-scribner commented 2 years ago

This sort of situation is handled by tools like kapp, but the scope of kapp is a different from what lightkube is trying to do so not sure if it maps well to here.

gtsystem commented 2 years ago

Just tested removing rules: [] (rules is optional anyway) from aggregate-clusterrole, and it works fine. This make sense because if you try to send an empty array, k8s will assume you want to change the content of this attribute and fails.

gtsystem commented 2 years ago

Closing at is seems there is a workaround.

ca-scribner commented 2 years ago

Sorry, I thought I had responded to this.

Removing rules does avoid the issue, but this feels like an awkward workaround. rules is optional, but not prohibited, so a lot of valid yaml manifests using aggregated roles exist with this attribute defined. If lightkube does not handle this implicitly, applying those manifests becomes unstable (for example, .apply() might work at first on a clean cluster, but then reconciling later and invoking a second .apply() would raise the conflict).

This also feels like a departure from expected behaviour based on the other similar tools. For example, kubectl handles this similar case fine:

cat << EOF > aggregator_role.yaml 
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aggregator-clusterrole
  labels:
aggregationRule:
  clusterRoleSelectors:
    - matchExpressions:
        - {key: test.com/aggregate-to-view, operator: Exists}
rules: []
EOF

# First time works
kubectl apply -f aggregator_role.yaml

# Second time also works
kubectl apply -f aggregator_role.yaml
gtsystem commented 2 years ago

lightkube uses server side apply and you can test yourself that the behaviour of lightkube is inline with kubectlwhen using server side apply:

$ kubectl apply -f cr1.yaml --server-side
clusterrole.rbac.authorization.k8s.io/aggregated-clusterrole serverside-applied
clusterrole.rbac.authorization.k8s.io/aggregate-clusterrole serverside-applied

$ kubectl apply -f cr1.yaml --server-side
clusterrole.rbac.authorization.k8s.io/aggregated-clusterrole serverside-applied
error: Apply failed with 1 conflict: conflict with "clusterrole-aggregation-controller": .rules
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
  command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
  manifest to remove references to the fields that should keep their
  current managers.
* You may co-own fields by updating your manifest to match the existing
  value; in this case, you'll become the manager if the other manager(s)
  stop managing the field (remove it from their configuration).
See https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts

See also the error text "If you do not intend to manage all of the fields, please edit your manifest to remove references to the fields that should keep their current managers.".

To me this is the intended behaviour. I'm not sure why the client side apply behave differently, anyway I suggest you open an issue in the kubernetes repository and if they fix the server-side apply, lightkube will automatically take advantage of that.