jupyterhub / mybinder.org-deploy

Deployment config files for mybinder.org
https://mybinder-sre.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
76 stars 74 forks source link

Measuring and then speeding up launches #909

Open betatim opened 5 years ago

betatim commented 5 years ago

@saulshanabrook voiced some interest in helping us figure out why launches take as long as they do so we can then improve on it.

Right now the hypothesis is that pulling the images onto a node is what takes the vast majority of time. However I don't think we have any data to back that up. So maybe the first step would be to instrument things so we can generate some data.

I wanted to open this issue to start the ball rolling. Off the top of my head I don't have a good idea on a concrete actionable first step beyond "look into how we could instrument things".

There are several ways we could tackle the problem of "image pulls take long" problem, before we dive into those and get excited we should generate some data though.

consideRatio commented 5 years ago

Hmmm, kubespawner will listen for events etc, perhaps we can derive from KubeSpawner and look for how long time is spent in "pulling image" or similar and then log it somewhere? That is my 10 second intuition idea, probably want to discard it soon :p

betatim commented 5 years ago

We should check if the idea that only one image can be pulled at a time is correct.

betatim commented 5 years ago

My local docker CLI seems to be able to pull the same or different images in parallel. So if at all there is some locking in kubernetes.

kubelet has --serialize-image-pulls which defaults to true. This makes me think we do serialize pulls.

consideRatio commented 5 years ago

It probably makes sense though, because pulling 1/5 of 5 images is still 0 images completed, hmmm....