pusher / faros

Faros is a CRD based GitOps controller
Apache License 2.0
99 stars 15 forks source link

Feedback: Added resource finalizers #144

Closed sebastianrosch closed 4 years ago

sebastianrosch commented 5 years ago

@JoelSpeed I would like your feedback on this:

To prevent accidental deletion of resources due to etcd issues and k8s garbage collection, as discussed in #101, I added finalizers to the child resources. This is not the full implementation as discussed, and this behavior is currently not configurable. I would like your feedback on whether this is a good way forward.

We have tested this in different scenarios, most specifically in the etcd leader loss that lead to an outage as described in #143.

Since we still have to delete resources when removed from the GitOps repo, Faros also needs to ability to remove finalizers. But as the GitTrack controller, which knows about the leftover resources, does not know the child resources, it cannot remove the finalizers. Therefore, I moved deleting the GitTrackObject into the GitTrackObject controller so that it can remove the finalizer before being deleted. The GitTrackObject is only marked for deletion via an annotation.

I hope this makes sense. Thanks!

pusher-ci commented 5 years ago

Hi @sebastianroesch. Thanks for your PR.

I'm waiting for a pusher member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
JoelSpeed commented 5 years ago

Hey @sebastianroesch, I've just had a quick skim of this and I think there's probably some misunderstanding on finalizers here.

You shouldn't be setting a finalizer on the child resources (Deployments, services etc). Finalizers are intended for controllers to handle special clean up logic, not to just protect something from being deleted.

With the current implementation, when we delete the GitTrack, we will leave all child resources with a finalizer with no controller to remove them. The GC will see the GTO has been deleted and then mark all children for deletion, but sit in an infinite loop because of the finalizer. This is not a desirable state to be in as resource updates are blocked when they are marked for deletion except for finalizers.

What we should be doing for this is instead:

Please see my descriptions of this design in #101, I went into more detail there

sebastianrosch commented 5 years ago

Thanks for the feedback, that makes sense. I think I was too focused on protecting the child resources from being deleted and didn't trust the owner references anymore after our previous issues.

One thing I'm not sure of is how we would be able to distinguish the deletion of a resource from the GitOps repo and the loss of the GitTrack. In both cases, the deletionTimestamp would be set on the GitTrackObject, so on Reconcile the GitTrackObject wouldn't know whether to keep the child resource (loss of GitTrack case) or remove it (deletion of resource in GitOps repo).

JoelSpeed commented 5 years ago

One thing I'm not sure of is how we would be able to distinguish the deletion of a resource from the GitOps repo and the loss of the GitTrack. In both cases, the deletionTimestamp would be set on the GitTrackObject, so on Reconcile the GitTrackObject wouldn't know whether to keep the child resource (loss of GitTrack case) or remove it (deletion of resource in GitOps repo).

For the first pass, I wouldn't distinguish. We can get a PR ready with this feature that just keeps all resources around on any deletion event. Once we are happy with the "protection" mechanism, we could then create a PR into the feature branch with the should we keep or should we delete logic as described in https://github.com/pusher/faros/issues/101#issuecomment-474290027, does that seem like a reasonable way to tackle this?

pusher-ci commented 5 years ago

@sebastianroesch: PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.