Open mbaynton opened 2 years ago
Here are some additional notes related to the issue above.
imagecache
instance on a a single nodeimagecache
imagecache
instance to remove 10 images from the nodeJob not created (image-already-present)
output in the controller logJob not created
)imagecache
sync and status updateimagecache
imagecache
instance to append 10 images to the node (at the end of the list)Job not created (image-already-present)
output in the controller logJob not created
)imagecache
sync and status updateimagecache
listimagecache
instance to add 10 images to the node at the top of the listimagecache
sync and status update@mbaynton @omar-rs : Thanks for reporting this issue and the in-depth analyses you performed with kube-fledged.
I am keen on improving the performance of kube-fledged to meet your particular use-case. The scenario of modifying an existing imagecache is not fully optimised for performance i.e. it is treated as reconciling a new imagecache so you see ALL the image pulls (and deletes) getting queued to the image manager routing.
It makes perfect sense to queue only the image pulls (and deletes) that are required. I'll come up with a proposal for this.
Our use case involves having nearly two thousand distinct images on each kubelet, and different images on different kubelets. We are evaluating kube-fledged as a component of how we can manage our image collections at this scale.
A finding we’ve discovered that is not already covered by other issues is that when we edit an existing ImageCache CRD of this size, it takes a few minutes to perform the reconciliation between the desired images and the images actually present, even if the actual change only added or removed one image.
It looks like this is likely attributable to this block, which adds all images in the modified CRD to a rate-limited work queue. The identification of whether the image is already present occurs later, inside the queue consumer. Computing a diff between the image list in the updated CRD and the image list in the node status upfront once, before pushing to the work queue, might improve responsiveness.
We could be open to working on this issue so that kube-fledged better meets our particular use case, but we wanted to file this issue as a first step to see if there is interest in supporting ImageCaches of this size in principle, and if you foresee any difficulties with the proposal to reconcile the CRD with the node status data upfront before pushing to the work queue.