jupyterhub / mybinder.org-deploy

Deployment config files for mybinder.org
https://mybinder-sre.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
76 stars 74 forks source link

Docker pulls on OVH vs GKE #1049

Open betatim opened 5 years ago

betatim commented 5 years ago

We should investigate why on OVH we have so many more docker pulls (according to the grafana dashboard):

https://grafana.mybinder.org/d/nDQPwi7mk/node-activity?refresh=1m&panelId=29&fullscreen&orgId=1&var-cluster=OVH%20Prometheus vs https://grafana.mybinder.org/d/nDQPwi7mk/node-activity?refresh=1m&panelId=29&fullscreen&orgId=1&var-cluster=prometheus

choldgraf commented 5 years ago

should we add the OVH folks to a jupyterhub team so that we can ping them for questions like these? Or add to the mybinder.org-operators team?

betatim commented 5 years ago

mybinder.org ops team sounds like a good idea!

betatim commented 5 years ago

I had assumed some of them were watching the repo already.

choldgraf commented 5 years ago

cc @mael-le-gal and @jagwar

mael-le-gal commented 5 years ago

Indeed the differences between the 2 dashboards are surprising ... I don't have any explanation for now

betatim commented 5 years ago

Watching pods and their logs it seems that we get a "pulling image" event for things like the tc_init container. However the pull is "super fast" and it seems very unlikely that the node didn't have this image already. Makes me wonder if there is a small difference in k8s versions or some thing like that where it always emits a "pulling" event even if the image is already present.

mael-le-gal commented 5 years ago

This afternoon the pull of that container minrk/tc-init:0.0.4 took too much time and caused the jupyterhub pod to fail several time after too much launching retry.

Just a small investigation :

# Executing inside the container
kubectl exec --namespace='ovh' ovh-dind-6vtwq -c dind -it sh

# Manually pull the image
docker -H unix:///var/run/dind/docker.sock pull minrk/tc-init:0.0.4

After doing that the image is present on the host :

# List images
docker -H unix:///var/run/dind/docker.sock images

REPOSITORY
minrk/tc-init

After waiting some time and launching again the same command the image seems to have disappear.

I saw that there are some pods named ovh-image-cleaner-***. Could they be responsible for that deletion ?

betatim commented 5 years ago

The dockerd inside the ovh-dind-... pods are only used to build new docker images. So the majority of pulls of minrk/tc-init shoud happen on the nodes actual dockerd (the one k8s uses).

However I think you found the problem. On GKE we run our nodes with two disks and store all the images for the DIND dockerd on a second disk. Because k8s itself does some clean up (images created by k8s doing its thing) and we do some cleanup (of images we create in our DIND). When we shared a disk between DIND and k8s the two garbage collectors would get in each others way/always run because they couldn't find anything to delete that would drop the limit low enough.

If you kubectl describe pod a dind pod on GKE you'll see something like:

Volumes:
  dockerlib-dind:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disks/ssd0/dind
    HostPathType:  DirectoryOrCreate

which is the extra disk we mount. If you describe one of the image cleaner pods it mounts the same disk:

  dockerlib-dind:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disks/ssd0/dind
    HostPathType:  DirectoryOrCreate

both pods mount this to /var/lib/docker. If I describe a ovh-dind-... pod it uses /var/lib/dind which I think means that directory is on the same partition as the docker image storage of k8s. And I think both image cleaner and k8s image GC do something like df -h to find out how full the disk is and then try and clean up to make space (which fails because they can't delete images controlled by the other dockerd).

If it is possible I think the simplest thing to do is to swap the nodes for ones which have a second disk that we can use for DIND.

@minrk spent a lot of time poking around image cleaning and races between the two dockerds so he might have some ideas as well.

mael-le-gal commented 5 years ago

Is there a reason for using a second disk or could it be on the same disk but in another directory to separate both ?

betatim commented 5 years ago

I don't know. Two possible reasons come to my mind: performance (the SSD is faster than the boot disk) and simplicity for computing how full the disk is.