kcp-dev / contrib-tmc

An experimental add-on readding some Kubernetes compute APIs and impement transparent multi-cluster scheduling
Apache License 2.0
5 stars 3 forks source link

bug: no user feedback when syncer fails to sync to downstream #104

Open ncdc opened 2 years ago

ncdc commented 2 years ago

Describe the bug

If the syncer fails to sync from kcp to a downstream cluster, it may encounter https://github.com/kcp-dev/kcp/blob/2ef2efdcf280707a6054f0381f9e66b7c0cd2e88/pkg/syncer/spec/spec_process.go#L389-L391

This error is only logged in the syncer's log and never surfaced to the user.

Steps To Reproduce

  1. Set up kcp with a SyncTarget and a running syncer
  2. Create a deployment
  3. Remove the deployment's spec (this should not be allowed, but is today):
kubectl patch --patch '{"spec":null}' --type strategic deployments/my-app
  1. Check syncer logs - should see something like this

    E0831 17:54:33.564808       1 spec_process.go:390] Error upserting deployments kcp-<hash>/my-app from upstream root:some-ws|default/my-app: Deployment.apps "my-app" is invalid: [spec.selector: Required value, spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`, spec.template.spec.containers: Required value]
  2. Check deployment in kcp - no information about this error is presented to the user

Expected Behaviour

The user must have some indication that there is a problem. Ideally we set some condition.

Additional Context

If we encounter an "invalid" error from the server, as in this example, we should consider pausing attempting further syncing of that resource until the resource has been modified in some way (to avoid retrying over and over again).

stevekuznetsov commented 2 years ago

Is this generally something we expect SyncTarget status/conditions to expose?

ncdc commented 2 years ago

I don't know that's sufficient. I don't believe we anticipate users always having access to the SyncTarget CR

sttts commented 2 years ago

Is this generally something we expect SyncTarget status/conditions to expose?

We definitely want this. But as Andy says there is also the user side. The Placement object would be basically (modulo during migration to a new cluster) a per-synctarget resource and could hold conditions.

stevekuznetsov commented 2 years ago

Great - that was a leading question to figure out what the implementation might look like here ;)

stevekuznetsov commented 2 years ago

We might need a more generic approach for objects without status/conditions fields (like a ConfigMap)

mamachanko commented 1 year ago

Since I have not found this issue right away I am providing a bit more colour and SEO.

Say, kcp syncs resources into a physical cluster and in the physical cluster there's a ValidatingWebhookConfguration/validating webhook for resources of its kind. The current behaviour is as follows:

Handling this nicely from a user perspective is tricky to get right I think. One could argue that when the resource is invalid downstream, then why should I be able to apply it in kcp. On the other hand, this is a naive argument since it might only be invalid currently or in one of the many physical clusters.

Surfacing the problem in an annotation on the upstream object would feel consistent with the existing approach to annotations, but it's also very easy to overlook.

Currently, application of downstream resources happens here afaik https://github.com/kcp-dev/kcp/blob/49d1b8c2ab42e31710542dfc392ab7b3a41cf115/pkg/syncer/spec/spec_process.go#L534-L545

mjudeikis commented 11 months ago

/transfer-issue contrib-tmc