kyma-project / cloud-manager

Apache License 2.0
3 stars 14 forks source link

Zombie GCP Filestore Backups #531

Closed abalaie closed 1 month ago

abalaie commented 1 month ago

Description

During extensive manual tests, I noticed that sometimes multiple backups are getting created for the same CR. The root cause was that, the status was not getting updated because of an error, and therefore in the next reconciler loop, backup was getting attempted again. As we don't have a reference to these backups, they remain there and won't get cleaned up and cause additional cost without being used. In addition, if this can happen once, can happen multiple times and eventually we will end up with many unused resources. And this can happen to all other resources that solely rely on the id of the resource getting updated on the status Expected result

I expect only one backup resource to get created.

Actual result

Two backups were created on the GCP where only the later one (by few seconds) was referenced by the GcpNfsVolumeBackup CR.

Steps to reproduce

Manually running a test again and again. Please note that this was discovered during one of my manual tests, and could be related to the test conditions I had. But theoretically it can happen as long as we have one line that requests the backup creation and another one that updates the status with its id, which is the only way to reference it from cloud-manager.

Troubleshooting

abalaie commented 1 month ago

I can think of two way to solve this:

dushanpantic commented 1 month ago

Setting .status.id first, and then using it for name resolution, is how we implemented other resources. I am voting for the first option.