Open js-9 opened 2 years ago
Jotting down some of my thoughts around implementing this feature:-
This feature is hugely important to our project. Our new nodes need to be able to start caching images immediately on launch.
Jotting down some of my thoughts around implementing this feature:-
- kube-fledged controller already has a Node informer cache. This is today used to list nodes, not yet used to track node lifecycle.
- How to detect when a new node is added to the cluster? Is it kubelet that creates the Node resource? I assume the status of the node would be NotReady when created and then later updated to Ready status?
- When a new node gets added and at that time an ImageCache operation is ongoing (e.g. auto-refresh, update), we need a mechanism to queue the node and process it once the previous operation is completed.
- If auto-refresh is enabled should we ignore new node addition or act upon it?
- What should be the status/reason/message of the ImageCache resource when reacting to the addition of new node?
Thinking about my use case, any delay to caching images would be undesirable, so I'd be keen to see a job created as a one off on the immediate detection of a new node. Once this job has completed, either success or failure, maybe add a label to the node which would allow the controller to use it regularly going forward.
I wouldn't worry about the node being unschedulable as a special case here, as I that could happen to any node at any time, so should be something you cater for at all times anyway.
As for the status message, a quick one off message should be fine. If the users are doing any logging then hopefully they are tracking and logging all messages on the resource, and not just the latest.
The other approach, which would be a big change in strategy, would be for each node to have a separate timer, rather than all thr jobs be created at once. Arguably good from a network bandwith perspective, but you'd need something like a daemonset to have the independence to time each node. It wouldn't answer your question of status message on the cache resource though unless you deliberately listed each node's status independently as a different line in the message.
@ChevronTango We implemented a simple prototype for this:
Ideally I do not need to periodically refresh images at all since we tag our images, however what I would need is the images to be cached to new nodes as they join the cluster. This does not happen if the image refresh time is set to 0s to disable periodic refreshing.