AWS Adapter Behavior: Objectives

Objective 1: When the results of AWS cloud config are accidentally deleted or lost, there must be a way for the user to recover that information without having to uninstall and install the helm chart again.

Objective 2: It is a common scenario that the user would not have provided the right settings when installing the adapter. When the user encounters some error in the adapter logs that leads to an error in fetching the cloud-config information, they can take necessary measures to fix the error (it could be correcting typos in values.yaml or some fixes in the prerequisite steps needed for the adapter). Once the errors are fixed, the adapter should reconcile and provide the output. Again, in this case, the user does not have to uninstall and install the helm chart.

Objective 1

The best thing would be to not need config parameters to generate status in the first place, but somehow, the EKS cluster does not have its name and region details stored in any standard place within the cluster. So we need those.

If the user loses the status downloaded earlier, then the following options can be considered.

If status fields are clubbed with config fields in same CR, then the config is also deleted with status. This becomes to Objective 2, described later.
Split the CR into config and status CRs. If user deletes status CR, then it can easily be recreated from config (assuming the config was correct earlier). Of course, user could delete the config CR too, but that is Objective 2.
Prevent deletion of the status CR in the first place (can be done via webhook, need to write one)

The case where config itself is deleted; either accidentally, or consciously to correct errors, we cover that under Objective 2 described below.

Objective 2

If the AWS adapter config itself is deleted, then we can try auto creating that with earlier known config params (clustername and region). It can be achieved by storing earlier parameters somewhere, say an internal configmap. We have various options for reinstating those earlier params.

Reinstate to first set of config parameters provided at helm install time if any
Reinstate to last available config parameters, after user has tried changing those multiple times
Reinstate to last known config parameters which had returned a successful result
Try multiple of the above in some sequence and reinstate to one that works
Use helm upgrade, instead of helm delete-install, it will created the deleted CR.
Prevent deletion of the config CR in the first place, (via webhooks)

Another line of reasoning

It is likely that user landed up accidentally deleting the Status+Config CR because, changing the Config was not resulting in fetching new AWS parameters, or not even giving a proper indication that a fetch had reattempted, but failed again. And this led to the above two Objectives being very important.

The late refresh problem was present in an earlier version where the user had to wait till the next sync period for the fetch attempt. So, we provided a workaround where the deployment could be scaled down to 0 and scaled up, to trigger the status fetch immediately. Perhaps, had it not been for this behaviour, the probability of a user deleting CRs (accidentally or to force a refetch with new params), would have reduced significantly. In which case it would be ok to document how to recreate CR via helm upgrade, or manual creation of the CR, say in a troubleshooting section of the documentaiton.

nirmata / kyverno-aws-adapter