rancher / kim

In ur kubernetes, buildin ur imagez
Apache License 2.0
326 stars 19 forks source link

kim built images are asynchronously replicated into k8s.io namespace #84

Open milas opened 2 years ago

milas commented 2 years ago

Current Behavior

The kim agent watches for containerd image events: https://github.com/rancher/kim/blob/e597b9564b47213734787b3e0c540a635b250bbf/pkg/server/agent_linux.go#L82-L105

On image create/update events, the handler copies the new/updated image to the k8s.io namespace so that it's visible to CRI/usable by kubelet.

This all happens asynchronously / in its own goroutine. kim build is unaware this is happening and does not block on it.

Desired Behavior

It'd be nice to be able to (optionally?) wait for the sync to have finished when calling kim build to guarantee the image is ready for use.

Context

As it stands, it's possible to build an image with kim and attempt to use it in a Deployment before the sync has finished, resulting in errors/retries on the K8s side.

We're seeing this with Tilt, where we have a kim_build extension - Tilt calls kim and then applies the updated YAML to the cluster, resulting in some retries/backoff because the sync might not be done yet.

dweomer commented 2 years ago

I was thinking about this when I was working on #79 to fix #74 (one of the reason I hadn't merged #79 yet: for the edge case I was attempting to fix this asynchronicity became more pronounced). My idea to fix #79 is to refactor the content copy to happen on a calling context initiated by the client but still mediated by the backend agent, similar to how pull/fetch works. Then the default client implementation would be to block on copy progress (with a reasonable timeout) which would address the problem that you have encountered.