kubernetes-sigs / cli-utils

This repo contains binaries that built from libraries in cli-runtime.
Apache License 2.0
154 stars 77 forks source link

feat: Add dependency filter #555

Closed karlkfi closed 2 years ago

karlkfi commented 2 years ago

Fixes https://github.com/kubernetes-sigs/cli-utils/issues/526 Fixes https://github.com/kubernetes-sigs/cli-utils/issues/528

Replaces https://github.com/kubernetes-sigs/cli-utils/pull/523

karlkfi commented 2 years ago

For https://github.com/kubernetes-sigs/cli-utils/issues/526, it is not clear this is a bug as much as an optional feature. If we are solving this, we are not going to do it a couple days before a release without tests.

IMO we should not launch depends-on in Config Sync without this bug fixed, and it was probably a mistake to launch the feature in kpt with this bug in the first place. Prior to this PR, depends-on was only half implemented (sorted ordering, but no enforcement).

karlkfi commented 2 years ago

https://github.com/kubernetes-sigs/cli-utils/issues/528 is the result of overzealous client-side validation. As I mentioned, we should not be doing client-side validation. The solution is to rollback the overzealous validation.

For CRDs this is probably fine, because there is a service-side error that prevents deletion of CRDs with CRs in use.

For namespaces tho, this is catastrophic, because namespace deletion causes garbage collection of everything else in the namespace.

For all other explicit dependencies there is no server-side dependency enforcement. So the client-side enforcement cannot be rolled back without completely removing the dependency ordering feature.

karlkfi commented 2 years ago

https://github.com/kubernetes-sigs/cli-utils/issues/554 should be a simple fix with a better test to prove the fix.

This is fair. I can extract this to another PR.

karlkfi commented 2 years ago

Pulled out fix for #554: https://github.com/kubernetes-sigs/cli-utils/pull/556

karlkfi commented 2 years ago

https://github.com/kubernetes-sigs/cli-utils/issues/528 is the result of overzealous client-side validation. As I mentioned, we should not be doing client-side validation. The solution is to rollback the overzealous validation.

@seans3 Removing the validation for external dependencies won't fix this bug. It would stop the confusing validation error, but it wouldn't fix the behavior. It wouldn't fail server-side, because there's no server-side impl. And without the DependencyFilter, neither the apply nor the prune would be skipped. So the apply would go through and then its dependency would be deleted. And we would lose the validation error for otherwise legitimately external dependencies.

Combining the graph for both applies and prunes allows for detection of implicit dependencies (CRDs & Namespaces) which would have otherwise been missed. And for explicit dependencies (depends-on & apply-time-mutation), it detects them as "internal" dependencies instead of "external". Then both implicit and explicit deps can be checked by the DependencyFilter. And to keep the behavior otherwise the same, the graph only needs to be sorted once and then filtered down to include either the apply objects or prune objects.

I was planning to add more unit tests, but the e2e tests and applier/destroyer tests already show that the existing behavior is otherwise preserved, and I wanted to make sure the approach was otherwise acceptable before writing more tests.

karlkfi commented 2 years ago

/hold

karlkfi commented 2 years ago

/unhold

Added unit tests:

Discussed approach with @seans3 offline. His primary complaint was that the dependency ordering was never intended to be a guarantee, just an expedient to produce fewer errors. However, while this may have been true when dependencies were first added, there are a number of use cases that have been discovered that are not solved by just ordering, which are better solved by an ordering guarantee. Most of these use cases come down to improving UX such that the error is obvious and once fixed in the local config, a re-apply is sufficient to resolve the problem.

One example worth calling out is that this helps expedite deployment of chains of dependencies that would otherwise be applied all up front and fall into retry-back-off, causing end-to-end reconciliation to take orders of magnitude longer to complete compared to waiting to apply resources until their dependencies are reconciled. This is especially true for infrastructure resources which take a notoriously long time to reconcile (like GKE clusters). Plus, some retry loops (like Jobs) have a finite number of retries configured before failing, after which a reapply of the initial spec is insufficient.

k8s-ci-robot commented 2 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: karlkfi, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/cli-utils/blob/master/OWNERS)~~ [seans3] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Liujingfang1 commented 2 years ago

Thanks for adding the unit tests.

/lgtm