jupyterhub / mybinder.org-deploy

Deployment config files for mybinder.org
https://mybinder-sre.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
76 stars 75 forks source link

Testing out docker hub as registry #1298

Open betatim opened 4 years ago

betatim commented 4 years ago

In #1295 #1296 #1297

I changed the OVH deployment from using an onsite docker registry (which right now is down) to use Docker Hub. The Gesis cluster uses Docker Hub and according to @bitnik they were thinking about deploying their own "eventually" but it seems to work so well that that isn't a priority. Wanted to try it out with OVH so we can gain some experience with that as well. the registry there being down seemed like a good opportunity.

I don't think of this as a "let's switch to docker hub forever", just experimenting. If things go well we should keep it in mind though because maintaining a reliable docker registry is hard work. So maybe we can use docker hub instead and reduce maintenance burden.

sgibson91 commented 4 years ago

I just wanted to document a conversation we had in gitter around using docker hub for all federation members.

@sgibson91: Maybe we should think about swapping the whole federation to docker hub? It would solve a lot of the issues around it mattering which cluster users got sent to. What are the downsides to using one docker hub org, besides sharing login? (which I guess could go in secrets/common.yaml?)

@betatim: using one shared dockerhub for all clusters makes me wonder about two clusters trying to push the same image at the same time (no idea if this would happen and if it would cause an error). The other thing is that OVH is a small deployment compared to GKE which makes me wonder how the load/network/throttling would work out (OVH has a fixed set of nodes that have stuff "in cache", on GKE we "constantly" add and remove nodes so their image caches are empty). However a lot of this is speculation in terms of performance effects as i don't think we've ever benchmarked it

@minrk: Two clusters pushing the same image would be a race, but shouldn't be a source of errors. Whoever pushes last would just clobber the first one. Assuming all clusters are in sync for versions, the images will be the same (content-wise, layer hashes presumably won't match so some space/bandwidth/time will be wasted) The main downside, I think, is that I would expect pulls to be slower from docker hub than from a registry internal to a given cluster's hosting service. But that's an assumption; I don't think we've measured pulls from docker hub vs our registries, so the shared registry might be worth it, since registry behavior seems to have been one of the biggest issues with deployments other than gke.

So an actionable item from this conversation would be to figure out and implement a way to benchmark the docker pulls.

manics commented 4 years ago

A couple of tools that might be useful in future if Docker Hub is too slow:

yuvipanda commented 3 years ago

I think the DockerHub rate limits now make this difficult? https://docs.docker.com/docker-hub/download-rate-limit/

sgibson91 commented 3 years ago

The Turing deployment uses Docker Hub. So far I haven't heard of/come across any issues, but if consolidated - yeah it would be rate limited I think