Open jzf2101 opened 6 years ago
Should we re-title this issue to "Uses-cases for changing the base image"? Otherwise this will become a mixture of "how to do GPU support" and "base images for X and Y and Z".
@jzf2101 : Good to seem some momentum around this. repo2docker with GPU support is something we have been actively using on crowdAI. We managed to collect about 800 repositories (and numerous tags in each of the repositories), amounting to ~1TB of code, that are repo2docker (the crowdai-fork) compatible , for two of the NIPS competitions this year that I was co-organizing. And we have a few more challenges on crowdAI coming up which will aggresively use the same setup.
The way we do it, is very hacky, and non-standard. We simply change the base image to nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
. Which ties all submissions to cuda-9.0, and cudnn7, while in many cases, users would want to choose which cuda version and cudnn version they want to build against.
Unfortunately, given the licensing situations with cuda and cudnn, the only way we can have cuda/cudnn in built images, if we build on top of the official base images released by nvidia : https://hub.docker.com/r/nvidia/cuda/tags/ All of them are built on top of Ubuntu 14.04/16.04/18.04.
We have seen some weird behaviour in terms of reproducibility with building dockerimages with these images as the base image, as for some reason, they abruptly update the old tags, sometimes breaking production code, or some weird dependencies. Wish they versioned the images better. But this would be an important consideration from repo2dockers point of view.
So ideally, in context of GPU support, the configurations that we would need inruntime.txt
would be the cuda version, and the cudnn version, when present, the buildpack should try to build a GPU compatible image (with an appropriate message ofcourse).
And we would really love to see this released as a part of repo2docker. our fork is already 324 commits behind, and as a small team focused on quite a few things, it might not be possible to keep catching up with repo2docker all the time :D
But as I mentioned, its great to see some momentum around GPU support here, and would love to see this shipped with repo2docker soon :D
The Dockerfile
shown in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/994#issue-373992464 suggests that we do not need to switch base images to be able to support running on a Nvidia GPU. Is there something missing from the linked Dockerfile
(it has only been tested on GKE)?
If that Dockerfile
also works on a local machine (maybe with nvidia-docker
) then this isn't a use case that motivates adding the ability to switch base images.
@betatim : I quickly skimmed through it, but I think I understand whats hapenning with it.
In case of GPU support for the k8 service on GKE, which I think is still in beta mode, mounts the cuda drivers from the host nodes. GKE ensures that the cuda drivers are all accessible at a particular location (usually /usr/local/nvidia
) on the host machine, so that the k8 pods can mount them on the go.
But thats something, which would be hard for us (and many other repo2docker users) to use in the long run, because there you basically pin a whole cluster to a particular cuda version. And this might have many unintended side effects, for instance the tensorflow-gpu pip packages was not supported with cuda-9.1, and users might want to explicitly choose the cuda and cudnn versions they want to run their code against, which might have huge impacts on the performance of their code.
Hence, in an ideal case, we would want each repository to specify its own cuda/cudnn versions, and build off from a base image which have the necessary drivers in the image itself.
And that Dockerfile will not work on local machine, unless you ensure that cuda is present at the correct location, and is mounted onto the container during runtime at the appropriate location(s).
@betatim suggests we may need to call:
repo2docker --no-run https://myrepo.com
will build an image from the repo that you can then run withnvidia-docker run <name-of-image-here>
@jzf2101 : Ahh ! well nvidia-docker
will install the basic nvidia drivers depending on the gpu on the host machine, but not the full cuda toolkit. The nvidia drivers enable the container to "see" the GPUs on the host machine, but they still expect a CUDA capable container to be able to use the GPUs in any sensible way.
And the licensing of CUDA (and cudnn, etc), makes it very non trivial to efficiently package them into prebuilt images, hence most people just start off the official nvidia/cuda base images. Installing cuda through scripts etc, would require you to expressly agree to the terms of usage by nvidia during the installation, which cannot be abstracted away in automated build processes (at least in a legal way).
So you're not calling the nvidia runtime when you run crowdai-repo2docker?
@jzf2101 : No, when we use crowdai-repo2docker, then we do not even need nvidia-docker
, the goal there is to build a CUDA capable docker image.
When we run the docker image, then we need nvidia-docker
if running locally, or the nvidia-gpu-device-plugin
daemonset on a k8 cluster (while requesting for nvidia/gpu:1
).
what if we did the following:
--base-image
that for now would accept something like org/image:tag
.Warning: Alternate base image provided. Reproducibility or compatibility with repo2docker build-packs is not guaranteed. repo2docker will only work with base images running <LIST-OF-REQUIRED-THINGS>.
What do folks think?
FWIW I think we should provide documentation on how to also avoid swapping the image in the case that you DO want CDA drivers.
To use your GPU there are two things that need doing: install various things inside your container and install various things on the host.
repo2docker can't help you with installing things on the host. In the comment I linked to above they are running on GKE and use some GKE specific way to install things on all the nodes in the cluster. I'd expect users to take care of all that "somehow".
The next question is how do we get stuff into the docker image. This is where a lot of people use the Nvidia image as a base image.
Taking a look at the Dockerfile
for nvidia/cuda it appears to install some apt packages and set some environment variables. What I'd like to confirm is if the Dockerfile
in this comment does the same as the Nvidia Dockerfile does. My guess is the answer is yes, but I don't have access to a GPU to check.
# For the latest tag, see: https://hub.docker.com/r/jupyter/datascience-notebook/tags/
FROM jupyter/datascience-notebook:f2889d7ae7d6
# GPU powered ML
# ----------------------------------------
RUN conda install -c conda-forge --yes --quiet \
tensorflow-gpu \
cudatoolkit=9.0 && \
conda clean -tipsy && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER
# Allow drivers installed by the nvidia-driver-installer to be located
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64
# Also, utilities like `nvidia-smi` are installed here
ENV PATH=${PATH}:/usr/local/nvidia/bin
Instead of installing apt packages it uses conda-forge packages. This is easier for us to do in repo2docker as we currently do not support adding extra PPAs before installing user specified apt packages.
This would give the user control over which version of CUDA to install inside the image no? It would have to match the drivers installed on the host but that is also the case when you use the Nvidia provided image.
@parente @consideratio can you confirm? @betatim https://github.com/jupyter/docker-stacks/issues/745 also https://github.com/conda-forge/conda-forge.github.io/issues/63#issuecomment-403079964 discusses a lot of the things you have observed
I have a VM with a GPU and docker set up if you want me to figure out if they're equivalent I just don't know how to test it, @betatim
Could you build the Dockerfile
I gave above and then start it. Then checkout step 7 of https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/994#issue-373992464 to see if your container can see the GPU and if yes if the tensorflow example notebooks run?
For the neurips.mybinder.org deployment (Binderhub with GPUs) we managed to deploy several repos without making any change to repo2docker. Instead we installed a few extra libraries via existing mechanisms.
This made me think: can we support the CUDA use-case via a build pack that installs additional packages and sets environment variables? We'd need a way to trigger the build pack.
I think that's a great idea
I think we can use the neurips blog post to structure the stuff that we did in a way that'll make it easier to decide what changes could be made to repo2docker (e.g. a buildpack) to accomodate this. One of the reasons I'd like to get that post ready sooner than later is so that we don't forget all the stuff we did to make it work :-)
Hey all, I'm interested in using repo2docker to build a cuda enabled image from an existing github repo. Any updates on this issue?
You can build GPU enabled images with repo2docker by installing the cudatoolkit
conda package. Check out the demo at https://github.com/jzf2101/GAN_tutorial/tree/gpu-binder which we used at NeurIPS. No changes to repo2docker were required.
Ah awesome, thanks @betatim
You can build GPU enabled images with repo2docker by installing the
cudatoolkit
conda package. Check out the demo at https://github.com/jzf2101/GAN_tutorial/tree/gpu-binder which we used at NeurIPS. No changes to repo2docker were required.
This solution doesn't work for me and I have no clue really why. When I request a gpu node on my cluster nvidia-smi does become available, but with the cuda toolkit installed via conda alone PyTorch does not see any Nvidia driver.
For my pods to work with pytorch they need the cuda compatibility packages installed from Nvidia's private package source, I've tested this a lot today, cutting parts out of dockerfiles and rebuilding.
My solutions are:
Changing the base image of repo2docker to Nvidia cuda from the bionic build pack
Or somehow add additional sources (as in the Nvidia cuda docker image) https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/11.2.1/ubuntu18.04-x86_64/base/Dockerfile
Or run a postbuildadmin script (doesn't exist yet?) inrm repo2docker
Or use the crowdai repo2docker image instead of repo2docker (currently doesn't work)
Or use a Dockerfile and circumvent most of repo2docker (less desirable)
Do you guys have anymore ideas?
To use your GPU there are two things that need doing: install various things inside your container and install various things on the host.
repo2docker can't help you with installing things on the host. In the comment I linked to above they are running on GKE and use some GKE specific way to install things on all the nodes in the cluster. I'd expect users to take care of all that "somehow".
The next question is how do we get stuff into the docker image. This is where a lot of people use the Nvidia image as a base image.
Taking a look at the
Dockerfile
for nvidia/cuda it appears to install some apt packages and set some environment variables. What I'd like to confirm is if theDockerfile
in this comment does the same as the Nvidia Dockerfile does. My guess is the answer is yes, but I don't have access to a GPU to check.# For the latest tag, see: https://hub.docker.com/r/jupyter/datascience-notebook/tags/ FROM jupyter/datascience-notebook:f2889d7ae7d6 # GPU powered ML # ---------------------------------------- RUN conda install -c conda-forge --yes --quiet \ tensorflow-gpu \ cudatoolkit=9.0 && \ conda clean -tipsy && \ fix-permissions $CONDA_DIR && \ fix-permissions /home/$NB_USER # Allow drivers installed by the nvidia-driver-installer to be located ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64 # Also, utilities like `nvidia-smi` are installed here ENV PATH=${PATH}:/usr/local/nvidia/bin
Instead of installing apt packages it uses conda-forge packages. This is easier for us to do in repo2docker as we currently do not support adding extra PPAs before installing user specified apt packages.
This would give the user control over which version of CUDA to install inside the image no? It would have to match the drivers installed on the host but that is also the case when you use the Nvidia provided image.
@betatim I have a gpu enabled binderhub I'm testing on, I could give you access if you would like to play? Running this Dockerfile now.
FROM tensorflow/tensorflow:latest-gpu-jupyter
# --- Jupyter
# install the notebook package
RUN pip install --no-cache --upgrade pip && \
pip install --no-cache notebook
# create user with a home directory
ARG NB_USER
ARG NB_UID
ENV USER ${NB_USER}
ENV HOME /home/${NB_USER}
RUN adduser --disabled-password \
--gecos "Default user" \
--uid ${NB_UID} \
${NB_USER}
WORKDIR ${HOME}
USER ${USER}
# RUN conda install pip --yes
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
Also to note that this does work
@betatim
Would you be interested in testing your solution on a gpu enabled binderhub?
I ran into similar questions those days, I'll try to contribute to this discussion.
Should we re-title this issue to "Uses-cases for changing the base image"? Otherwise this will become a mixture of "how to do GPU support" and "base images for X and Y and Z".
It looks like there are multiple subjects:
The host configuration is what is installed bare metal. Depending on how you get the hardware, you may have to handle that configuration by yourself or even be unable to change anything on the host:
What has to be installed on the host is basically:
nvidia-container-runtime
or similar (see documentation for k8s)To keep JHub terminology, let's call a spawner
the engine/tool used to run a container from an image. It has to give access to the host GPU(s) resource(s).
repo2docker
offers capability to run images using docker
engine. By default, docker
is not exposing GPU devices to the containers (docker run ubuntu:focal nvidia-smi
won't work on a host having a GPU device)
docker
CLI, one can pass --gpus
extra option as documented
--gpus <value>
, NVIDIA_VISIBLE_DEVICES
env var is unset/empty making nvidia-container-runtime
working as runc
. See nvidia-container-runtime
documentationnvidia
images take care of setting this, but also provide CUDA
libraries (see the next section).repo2docker
CLI, I just created a PR to implement it (using docker
Python API) https://github.com/jupyterhub/repo2docker/pull/1079For k8s
use cases, it probably has to be handled on downstream repos such as jupyterhub
or binderhub
with:
Images need to be capable of running code (e.g. from a notebook) on GPU device(s).
Depending on the library used (pytorch
, tensorflow
, etc.), you may need extra dependencies to be installed to be able to execute your code. Those deps should be pinned by your project (or your dependencies), and installed by the package manager associated to your manifest/spec file (selecting the appropriate BuildPack
) at build
time.
CUDA
particular caseDepending on which API level is required (driver, runtime or CUDA libs), you may need various extra packages to be installed:
libcuda
libcudart
cuFTT
/cuBLAS
/etc., available from cudatoolkit
pytorch
installed with conda
needs cudatoolkit
to run on GPU
This final image will be pretty similar (but surely not equivalent) using:
nvidia
base image then installing dependencies from a manifest/spec files for a given BuildPack
BuildPack
to install both (providing the required tool chain is packaged with it)Even when using a minimal base image, relying on cudatoolkit
(which is pretty extensive) leads to large images.
In term of resources consumption, it would be tempting to think that re-using same layers through nvidia
base images is more efficient than storing a lot of huge layers for every package manager installation.
what if we did the following:
- Added a new parameter to repo2docker, something like
--base-image
that for now would accept something likeorg/image:tag
.- This image, if provided, would replace what's in the buildpack-deps image: https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/base.py#L13
- We provide a warning if this is provided that says something like
Warning: Alternate base image provided. Reproducibility or compatibility with repo2docker build-packs is not guaranteed. repo2docker will only work with base images running <LIST-OF-REQUIRED-THINGS>.
- Add a documentation page that more explicitly lays out the requirements in the base image.
What do folks think?
cc @choldgraf
If you are using conda
eco-system/BuildPack
upon nvidia
base image it would result in having the same large CUDA
libraries installed twice:
system
level (using apt
) through the base imageenv
level (using conda
/mamba
)To actually save space re-using the same layers from nvidia
base images, it would probably need more/too complex/risky strategies:
system
onesBuildPack
, with cuda
libs already installed in base environmentBut then, AFAIK it would also mean doing crazy things like:
cudatoolkit
found as a conda
requirement translate into selecting some nvidia/cuda-jupyter:runtime
image)Maybe some of you found much more efficient strategies!
I have a working example of installing cuda all the requisite libraries using just the conda buildpack at
GitHub.com/ctr26/ZeroCostDl4Mic
My gpu k8s is currently down but it works locally atm.
FWIW I manually replace this line in the template with
FROM nvidia/cuda:11.2.0-cudnn8-runtime-ubuntu18.04
Or whatever cuda/cudnn combination is installed on the destination host. We have to add a couple apt installs using that image instead of the default bionic
but once that's done all I have to do is toggle that line to build with or without GPU.
It would make sense that some sort of enable_gpu
+ cuda_version
parameters be used to implement that toggling directly in base.py
using the nvidia base images of the Ubuntu version repo2docker is using, for example. Would that be too simplistic ?
This works for me on my k8s binderhub deployment
On 9 Dec 2021, at 20:30, Yves Moisan @.***> wrote:
FWIW I manually replace this line in the template with
FROM nvidia/cuda:11.2.0-cudnn8-runtime-ubuntu18.04 Or whatever cuda/cudnn combination is installed on the destination host. We have to add a couple apt installs using that image instead of the default bionic but once that's done all I have to do is toggle that line to build with or without GPU.
It would make sense that some sort of enable_gpu+ cuda_version parameters be used to implement that toggling directly in base.py using the nvidia base images of the Ubuntu version repo2docker is using, for example. Would that be too simplistic ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Related to https://github.com/jupyterhub/team-compass/issues/52 and https://github.com/jupyterhub/team-compass/issues/85 and the fork from @spMohanty that includes CUDA in the base image
@choldgraf was curious about how we could vary the base image in r2d. @minrk thought we could create an additional configuration file to specify the base image eg
runtime.txt
we should probably require ubuntu as the OS?This was mentioned in this month's meeting