kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
108.3k stars 38.86k forks source link

Pull main pods images while int container executing #88265

Closed lawrencegripper closed 4 years ago

lawrencegripper commented 4 years ago

What would you like to be added:

The aim here is to optimize the startup process for a pod with an initContainer (or multiple) followed by a set of normal containers. 

Currently the initContainer pulls it's image, runs for a while then after exiting the normal containers images are then pulled (if not present) and the containers started. Note if you have multiple initContainers this flow is repeated with init1 pull-> run then init 2 pull -> run then main workload.

I'd like to investigate pulling images in the background while the initContainer is started/running meaning that less time is spend waiting on IO. Once the initContainer is finished the main workload's image is already pulled and so would be quicker to start or the next initContainer would be pulled and ready to go. 

Why is this needed:

Lets be honest - it's not needed, however, it may speed things up for a few people doing ML stuff.

This is an optimization to improve resource utilization. On an expensive cloud SKU which runs a pod with in initContainer then a main workload, say training ML using GPU, it's advantageous to spend as much time as possible doing the computations. 

Typically some of the ML based images can be rather large making this problem more pronounced.

This is accentuated when a cluster is configured with the Autoscaler as typically the nodes which have just been created don't have any images present so always incur a pull cost, serially - first for the init then the normal containers. 

How?

I think this would require 2 sets of changes to the kubelet_manager.go

  1. When podContainerChanges.NextInitContainerToStart != nil in SyncPod then start an async pull of the pod's images
  2. Change image_manager.go to allow a method for starting an ImagePull but not immediately waiting on the pull result. 

So some very rubbish psuedo version would be: 

    // If first init container is about to be started start pulling other images (should this also start an async pull of other init containers?
    if len(pod.Spec.InitContainers) > 0 && &pod.Spec.InitContainers[0] == podContainerChanges.NextInitContainerToStart {
        for _, container := range pod.Spec.Containers {
            m.imagePuller.BackgroundEnsureImageExists(pod, &container, pullSecrets, podSandboxConfig)
        }
    }

    // Step 6: start the init container.
    if container := podContainerChanges.NextInitContainerToStart; container != nil {
        // Start the next init container.
        if err := start("init container", container); err != nil {
            return
        }

        // Successfully started the container; clear the entry in the failure
        klog.V(4).Infof("Completed init container %q for pod %q", container.Name, format.Pod(pod))
    }

    // Step 7: start containers in podContainerChanges.ContainersToStart.
    for _, idx := range podContainerChanges.ContainersToStart {
        start("container", &pod.Spec.Containers[idx])
    }

Not this assumes that the downstream service running the images pulls (docker or other) would handle the de-duplication of the calls to EnsureImagePresent with the same image.

lawrencegripper commented 4 years ago

/sig Node

tedyu commented 4 years ago

w.r.t. the de-duplication of the calls to EnsureImagePresent, the ImageManager needs to expose the images currently being downloaded so that, if init container runs to completion before the download is finished, we can wait on the image download instead of launching another download.

tedyu commented 4 years ago

We should also consider if this feature would be conditionally enabled. For init container whose image is small and whose execution is fast, the savings in async image download are not much. Maybe we can get statistics from the first few init container runs along with duration for normal container image download so that we can decide whether to enable the async image download.

lawrencegripper commented 4 years ago

Sounds like a good plan, with regard to the statistics is there a source of these I can pull from or is it best to run some test scenarios on a cluster?

For scenarios I'm thinking about testing with x initContainer's which run for for y secs with images of z size. I can then calculate the potential time savings for various values of x/y/z and get some more clear metrics of at which point this becomes useful and how useful it is in terms of time savings.

For de-duplication of EnsureImagePresent I've had a bit of a play locally with the docker daemon and found that it's already going to do this for us without any additional code. In the two recordings below I run a docker pullx2 on the same image at the same time and another where I start one of the pulls 10 seconds after the first. Both appear to result in the daemon only doing one pull with the later call reporting the context of the already in-progress pull. My assumption is the daemon uses a locking mechanism to ensure only one instance of a pull in progress for the same image. I'm a little unclear on what tests would be needed on different container runtimes, any ideas?

dockerdaemonpullsync1

dockerdaemonpullsync2

lawrencegripper commented 4 years ago

I've got a very rough hacky change together today which I can use to do the comparison: https://github.com/lawrencegripper/kubernetes/commit/b546db839c633665926a83482865c6993eace1ba

image

The change aims (need to do more testing) to start an async pull of the main pods images (will also add other initContainers) before the first initContainer is start. I'm hoping to then compare this to without it on the couple of scenarios outlined in the comment above.

tedyu commented 4 years ago

The pullChan is not returned for prepull. How do you check for the completion of prepulling image(s) ?

lawrencegripper commented 4 years ago

Woops, yeah as I wasn't receiving from the pullChan so the hacky version would block the puller. For testing now I'm just using a go routine in the async version to make sure the result it received. Will look at doing something better if the stats show the changes to prepull is worthwhile.

How do you check for the completion of prepulling image(s) ?

As I mentioned above it looks like the docker daemon is handling the de-duplication of the pull requests so I wasn't planning to track the results of the EnsureImageExistsAsync calls as when the normal start func is called for the containers the result of EnsureImageExists will return the progress of the already submitted pull request (need to test more). Sound reasonable? Happy to look at doing something more complex to track them if that is preferred.

lawrencegripper commented 4 years ago

Looking at this again I now agree that I'll need to track the results of the async pulls to make sure behavior like the Image pull backoff works as expected. @tedyu sorry for being slow to catch up on this one. I'll follow your suggestion of:

the ImageManager needs to expose the images currently being downloaded so that...

lawrencegripper commented 4 years ago

Got an update with the async pulls being tracked in the image_manager which seems to work nicely. Plan is to add some tests then use it to run some tests and see what difference this makes to start times.

https://github.com/lawrencegripper/kubernetes/commit/9ee6222b725e5dd0b26259ad925216b9784e6b10 [edit]

image

lawrencegripper commented 4 years ago

I've run some tests here with a mini test harness to demonstrate the improvement the changes have on startup times: https://github.com/lawrencegripper/hack-testingasyncpull

Looking to add more test coverage to the changes and share as a PR shortly unless there are any concerns or thoughts I should address first?

tedyu commented 4 years ago

I think with your second commit, the results are good.

Please open a PR.

Thanks

lawrencegripper commented 4 years ago

PR is now up here: https://github.com/kubernetes/kubernetes/pull/88984

lawrencegripper commented 4 years ago

Closed this one out as didn't have time to complete the work, the PR shows the discussions around pending work if someone else needs/wants to pick it up.

zchenyu commented 2 months ago

Can we re-open this issue? This still seems like a good feature to have