Reconciliation process with many images is slow

mbaynton commented 2 years ago

Our use case involves having nearly two thousand distinct images on each kubelet, and different images on different kubelets. We are evaluating kube-fledged as a component of how we can manage our image collections at this scale.

A finding we’ve discovered that is not already covered by other issues is that when we edit an existing ImageCache CRD of this size, it takes a few minutes to perform the reconciliation between the desired images and the images actually present, even if the actual change only added or removed one image.

It looks like this is likely attributable to this block, which adds all images in the modified CRD to a rate-limited work queue. The identification of whether the image is already present occurs later, inside the queue consumer. Computing a diff between the image list in the updated CRD and the image list in the node status upfront once, before pushing to the work queue, might improve responsiveness.

We could be open to working on this issue so that kube-fledged better meets our particular use case, but we wanted to file this issue as a first step to see if there is interest in supporting ImageCaches of this size in principle, and if you foresee any difficulties with the proposal to reconcile the CRD with the node status data upfront before pushing to the work queue.

omar-rs commented 2 years ago

Here are some additional notes related to the issue above.

Setup:

1676 images in a single imagecache instance on a a single node
all images are already pulled on the node (from an ECR registry)
all images are already part of the ImageCache instance definition

Test 1: Remove images from an `imagecache`

edit the imagecache instance to remove 10 images from the node
1666 Job not created (image-already-present) output in the controller log
first delete job did not start until about 2min 37sec from the start of the image cache edit - all this time is consumed by the checking of existing images (Job not created)
all delete jobs completed within 13sec of start
overall, took about 2min 51sec to complete the imagecache sync and status update

Test 2: Append images to the end of the `imagecache`

edit the imagecache instance to append 10 images to the node (at the end of the list)
1676 Job not created (image-already-present) output in the controller log
first pull job did not start until about 2min 37sec from the start of the image cache edit - all this time is consumed by the checking of existing images (Job not created)
all pull jobs completed within 21sec of start
overall, took about 3min 1sec to complete the imagecache sync and status update

Test 3: Add images at the top of the `imagecache` list

edit the imagecache instance to add 10 images to the node at the top of the list
10 pull jobs were created in < 1sec
Job not created (image-already-present)` output started to appear in the controller log
took about 20 sec to complete the last pull job
overall, took about 2min 40sec to complete the imagecache sync and status update

senthilrch commented 2 years ago

@mbaynton @omar-rs : Thanks for reporting this issue and the in-depth analyses you performed with kube-fledged.

I am keen on improving the performance of kube-fledged to meet your particular use-case. The scenario of modifying an existing imagecache is not fully optimised for performance i.e. it is treated as reconciling a new imagecache so you see ALL the image pulls (and deletes) getting queued to the image manager routing.

It makes perfect sense to queue only the image pulls (and deletes) that are required. I'll come up with a proposal for this.

senthilrch / kube-fledged