bug: no user feedback when syncer fails to sync to downstream

ncdc commented 2 years ago

Describe the bug

If the syncer fails to sync from kcp to a downstream cluster, it may encounter https://github.com/kcp-dev/kcp/blob/2ef2efdcf280707a6054f0381f9e66b7c0cd2e88/pkg/syncer/spec/spec_process.go#L389-L391

This error is only logged in the syncer's log and never surfaced to the user.

Steps To Reproduce

Set up kcp with a SyncTarget and a running syncer
Create a deployment
Remove the deployment's spec (this should not be allowed, but is today):

kubectl patch --patch '{"spec":null}' --type strategic deployments/my-app

Check syncer logs - should see something like this

E0831 17:54:33.564808       1 spec_process.go:390] Error upserting deployments kcp-<hash>/my-app from upstream root:some-ws|default/my-app: Deployment.apps "my-app" is invalid: [spec.selector: Required value, spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`, spec.template.spec.containers: Required value]

Check deployment in kcp - no information about this error is presented to the user

Expected Behaviour

The user must have some indication that there is a problem. Ideally we set some condition.

Additional Context

If we encounter an "invalid" error from the server, as in this example, we should consider pausing attempting further syncing of that resource until the resource has been modified in some way (to avoid retrying over and over again).

stevekuznetsov commented 2 years ago

Is this generally something we expect SyncTarget status/conditions to expose?

ncdc commented 2 years ago

I don't know that's sufficient. I don't believe we anticipate users always having access to the SyncTarget CR

sttts commented 2 years ago

Is this generally something we expect SyncTarget status/conditions to expose?

We definitely want this. But as Andy says there is also the user side. The Placement object would be basically (modulo during migration to a new cluster) a per-synctarget resource and could hold conditions.

stevekuznetsov commented 2 years ago

Great - that was a leading question to figure out what the implementation might look like here ;)

stevekuznetsov commented 2 years ago

We might need a more generic approach for objects without status/conditions fields (like a ConfigMap)

mamachanko commented 1 year ago

Since I have not found this issue right away I am providing a bit more colour and SEO.

Say, kcp syncs resources into a physical cluster and in the physical cluster there's a ValidatingWebhookConfguration/validating webhook for resources of its kind. The current behaviour is as follows:

create: When creating an invalid resource in kcp, the syncer will fail to create it in the physical clusters as it is not admitted. As a result, the resource exists in kcp, but there's no equivalent resource in the physical cluster.
update: When updating a valid, synced resource in kcp with an invalid patch, the syncer will fail to apply the change in the physical clusters as it is not admitted. As a result, the resource exists in kcp and in the physical cluster, but they have deviated. (as per @ncdc 's initial description)
delete: When illegally deleting a synced resource in kcp, the syncer will fail to delete its counterpart in the physical clusters as it is not permitted. As a result, the resource no longer exists in kcp, but remains in the physical cluster. (this is an assumption as I have not played through this scenario. I am likely wrong.)

Handling this nicely from a user perspective is tricky to get right I think. One could argue that when the resource is invalid downstream, then why should I be able to apply it in kcp. On the other hand, this is a naive argument since it might only be invalid currently or in one of the many physical clusters.

Surfacing the problem in an annotation on the upstream object would feel consistent with the existing approach to annotations, but it's also very easy to overlook.

Currently, application of downstream resources happens here afaik https://github.com/kcp-dev/kcp/blob/49d1b8c2ab42e31710542dfc392ab7b3a41cf115/pkg/syncer/spec/spec_process.go#L534-L545

mjudeikis commented 11 months ago

/transfer-issue contrib-tmc

kcp-dev / contrib-tmc