After working with the operator for a while, I realized the following:
using an admission webhook whenever a pod is updated/created to update the ECR registry secret is unecessary and can be dangerous. If a failurePolicy of Ignore is not used in the MutatingWebhookConfiguration, and the process of updating the ECR secret fails, this will block the pod update or creation, which can be very inconvenient. Using a failurePolicy of Ignore can guard against pod updates being blocked, however if a failure occurs in the operator during the process of updating the ECR secret, the pod update/creation will fail anyway, because kubelet will not be able to pull the image due to the expiry of the credentials. Therefore, I believe removing the webhook and relying on the kubernetes controllers requeue mechanism can simplify the overall process, and decouple the secret updates from the pod life cycle. Since the AWS API for ECR returns the expiry of the credentials as part of the response, this expiry can be used to schedule a reconciliation result.RequeueAfter = time.Until(*expiresAt)
The delete secret validation webhook can also be removed. Instead, a check can be done whenever there is reconcile and the secret can be recreated in case there are not there or have been deleted.
The AWSECRCredential CRD can be extended to carry informations about the AWS access. At the moment, a secret needs to be created by the operator user prior to creating a AWSECRCredential. To simplify things even further, the aws access key id and aws secret access key can be added to the spec.awsAccess. After creation, base64 can be applied to those (like kubernetes is doing for secrets)
To make troubleshooting easier, the operator needs to conform to some conventional K8 API conventions like the usage of the status subresource and also the submission K8 events.
After working with the operator for a while, I realized the following:
failurePolicy
ofIgnore
is not used in the MutatingWebhookConfiguration, and the process of updating the ECR secret fails, this will block the pod update or creation, which can be very inconvenient. Using afailurePolicy
ofIgnore
can guard against pod updates being blocked, however if a failure occurs in the operator during the process of updating the ECR secret, the pod update/creation will fail anyway, because kubelet will not be able to pull the image due to the expiry of the credentials. Therefore, I believe removing the webhook and relying on the kubernetes controllers requeue mechanism can simplify the overall process, and decouple the secret updates from the pod life cycle. Since the AWS API for ECR returns the expiry of the credentials as part of the response, this expiry can be used to schedule a reconciliationresult.RequeueAfter = time.Until(*expiresAt)
AWSECRCredential
CRD can be extended to carry informations about the AWS access. At the moment, a secret needs to be created by the operator user prior to creating aAWSECRCredential
. To simplify things even further, the aws access key id and aws secret access key can be added to thespec.awsAccess
. After creation, base64 can be applied to those (like kubernetes is doing for secrets)status
subresource and also the submission K8 events.