vmware-tanzu-labs / educates-training-platform

A platform for hosting interactive workshop environments in Kubernetes, or on top of a local container runtime.
https://docs.educates.dev
Apache License 2.0
63 stars 15 forks source link

Registry secret not being injected into default service account of session namespace. #392

Open GrahamDumpleton opened 1 month ago

GrahamDumpleton commented 1 month ago

Describe the bug

When a registry pull secret (or other secret), is being injected into a service account using SecretInjector, if the service account was still in the process of being configured, then injection can fail as the service account may have changed in the interim.

2024-05-31T01:32:09.782490507Z ERROR:educates:Service account default in namespace educates-cli-w04-s003-hub couldn't be updated.
2024-05-31T01:32:09.782541415Z Traceback (most recent call last):
2024-05-31T01:32:09.782551638Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/http.py", line 437, in raise_for_status
2024-05-31T01:32:09.782555835Z     resp.raise_for_status()
2024-05-31T01:32:09.782558066Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
2024-05-31T01:32:09.782559929Z     raise HTTPError(http_error_msg, response=self)
2024-05-31T01:32:09.782562280Z requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://10.96.0.1:443/api/v1/namespaces/ed
ucates-cli-w04-s003-hub/serviceaccounts/default
2024-05-31T01:32:09.782564077Z 
2024-05-31T01:32:09.782566452Z During handling of the above exception, another exception occurred:
2024-05-31T01:32:09.782568147Z 
2024-05-31T01:32:09.782569931Z Traceback (most recent call last):
2024-05-31T01:32:09.782571998Z   File "/opt/app-root/src/handlers/secretinjector_funcs.py", line 403, in inject_secret
2024-05-31T01:32:09.782573693Z     service_account_item.update()
2024-05-31T01:32:09.782575452Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/objects.py", line 165, in update
2024-05-31T01:32:09.782587022Z     self.patch(self.obj, subresource=subresource)
2024-05-31T01:32:09.782588725Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/objects.py", line 157, in patch
2024-05-31T01:32:09.782590048Z     self.api.raise_for_status(r)
2024-05-31T01:32:09.782591462Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/http.py", line 444, in raise_for_status
2024-05-31T01:32:09.782596799Z     raise HTTPError(resp.status_code, payload["message"])
2024-05-31T01:32:09.782600186Z pykube.exceptions.HTTPError: Operation cannot be fulfilled on serviceaccounts "default": the object has been 
modified; please apply your changes to the latest version and try again
2024-05-31T01:32:09.821340158Z INFO:educates:Injected educates-registry-credentials into imagePullSecrets of service account default in name
space educates-cli-w04-s003-cluster-1.
2024-05-31T01:32:09.827676084Z ERROR:educates:Service account default in namespace educates-cli-w04-s003-cluster-1 couldn't be updated.
2024-05-31T01:32:09.827743641Z Traceback (most recent call last):
2024-05-31T01:32:09.827753078Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/http.py", line 437, in raise_for_status
2024-05-31T01:32:09.827756941Z     resp.raise_for_status()
2024-05-31T01:32:09.827759218Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
2024-05-31T01:32:09.827762148Z     raise HTTPError(http_error_msg, response=self)
2024-05-31T01:32:09.827765218Z requests.exceptions.HTTPError: 409 Client Error: Conflict for url: https://10.96.0.1:443/api/v1/namespaces/ed
ucates-cli-w04-s003-cluster-1/serviceaccounts/default
2024-05-31T01:32:09.827767332Z 
2024-05-31T01:32:09.827770473Z During handling of the above exception, another exception occurred:
2024-05-31T01:32:09.827772669Z 
2024-05-31T01:32:09.827774847Z Traceback (most recent call last):
2024-05-31T01:32:09.827777497Z   File "/opt/app-root/src/handlers/secretinjector_funcs.py", line 403, in inject_secret
2024-05-31T01:32:09.827779829Z     service_account_item.update()
2024-05-31T01:32:09.827782804Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/objects.py", line 165, in update
2024-05-31T01:32:09.827785141Z     self.patch(self.obj, subresource=subresource)
2024-05-31T01:32:09.827787392Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/objects.py", line 157, in patch
2024-05-31T01:32:09.827789566Z     self.api.raise_for_status(r)
2024-05-31T01:32:09.827791627Z   File "/opt/app-root/venv/lib64/python3.12/site-packages/pykube/http.py", line 444, in raise_for_status
2024-05-31T01:32:09.827793663Z     raise HTTPError(resp.status_code, payload["message"])
2024-05-31T01:32:09.827796444Z pykube.exceptions.HTTPError: Operation cannot be fulfilled on serviceaccounts "default": the object has been 
modified; please apply your changes to the latest version and try again

A subsequent reconciliation triggered by updates to the service account should still see the secret injected, but need to verify this is occurring.

Additional information

No response

GrahamDumpleton commented 1 month ago

This is a actually a race condition in the operator between two rule applications triggered by different resources. For example:

DEBUG:educates:Triggering secretcopier reconcilation for secret educates-registry-credentials in namespace educates-cli-w04-s010-hub.
DEBUG:educates:Triggering secretinjector reconcilation for secret educates-registry-credentials in namespace educates-cli-w04-s010-hub.
DEBUG:educates:Triggering secretinjector reconcilation for service account default in namespace educates-cli-w04-s010.
DEBUG:educates:Triggering secretcopier reconcilation for namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Triggering secretinjector reconcilation for service account default in namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Triggering secretcopier reconcilation for secret educates-registry-credentials in namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Triggering secretinjector reconcilation for secret educates-registry-credentials in namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Processing rule 1 from secretinjector/educates-registry-credentials against namespace educates-cli-w04-s010-hub.
DEBUG:educates:Processing rule 1 from secretinjector/educates-registry-credentials against namespace educates-cli-w04-s010-hub.
DEBUG:educates:Processing rule 1 from secretinjector/educates-registry-credentials against namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Processing rule 1 from secretinjector/educates-registry-credentials against namespace educates-cli-w04-s010-cluster-1.
DEBUG:educates:Processing rule 1 from secretinjector/educates-registry-credentials against namespace educates-cli-w04-s010.

IOW, secretinjector rules were applied due to changes in service account and secret at the same time and since are different resource types they run in parallel, which you can see by duplicates for rule application later.

So the secret is being injected okay, just getting some noise in the logs with the full exception showing, from the reconciler function which looses out.

Not sure right now the best course of action, whether to check error status for 409 and ignoring it, or at least not log exception and instead just a warning that resource was updated.