rstudio / rstudio-docker-products

Docker images for RStudio Professional Products
https://hub.docker.com/u/rstudio
MIT License
66 stars 56 forks source link

Content images with NVIDIA support #521

Open bjfletcher opened 1 year ago

bjfletcher commented 1 year ago

Ey up!

Due to another issue (see https://github.com/rstudio/helm/issues/355), I've not been able to find out whether CUDA apps would work using the content images from Posit. In the event that they don't, I've prepared a Dockerfile based on NVIDIA's Dockerfile.

The only change I made was to replace:

FROM ubuntu:18.04 as base

with:

FROM rstudio/content-base:r4.1.0-py3.9.2-ubuntu1804 as base

in the hope that it would become an image that works on Posit Connect and can use GPU with the CUDA apps. See below for the full Dockerfile.

I'd be interested to know if this is how Posit team would do it, whether actually the base Posit content images already had CUDA support, or if you would do it differently.

Thank you!

Ben

FROM rstudio/content-base:r4.1.0-py3.9.2-ubuntu1804 as base

FROM base as base-amd64

ENV NVARCH x86_64

ENV NVIDIA_REQUIRE_CUDA "cuda>=11.4 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451"
ENV NV_CUDA_CUDART_VERSION 11.4.148-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-4

FROM base as base-arm64

ENV NVARCH sbsa
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.4"
ENV NV_CUDA_CUDART_VERSION 11.4.148-1

FROM base-${TARGETARCH}

ARG TARGETARCH

LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

RUN apt-get update && apt-get install -y --no-install-recommends \
    gnupg2 curl ca-certificates && \
    curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/${NVARCH}/3bf863cc.pub | apt-key add - && \
    echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
    apt-get purge --autoremove -y curl \
    && rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 11.4.3

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
    cuda-cudart-11-4=${NV_CUDA_CUDART_VERSION} \
    ${NV_CUDA_COMPAT_PACKAGE} \
    && rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
    && echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

COPY NGC-DL-CONTAINER-LICENSE /

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
msarahan commented 1 year ago

It looks like this is certainly possible to support, but might be hard. I think this discussion might belong over at rstudio-docker-products, so I'll move this issue after posting.

I say that this might be hard because it introduces another variable space into an already pretty large space, and also because of image size concerns. To proceed with supporting this, I think it would probably be wise to add an image, say content-base-cuda, with the same build matrix as content-base, but with an additional cuda specifier in the tag. I'm also not sure what we'd be able to do as far as testing goes, since github actions does not have GPUs available.

One concern I have is that it is already difficult to propagate changes from base layers up to other layers. This new image wouldn't fundamentally change the issue, but it would make one more set of build files to update when the base changes. There are some related issues at:

https://github.com/rstudio/rstudio-docker-products/issues/506 https://github.com/rstudio/rstudio-docker-products/issues/505 https://github.com/rstudio/rstudio-docker-products/issues/504

Hopefully this helps clarify things. The fastest way forward is certainly for you to run custom images yourself, as documented at https://docs.posit.co/helm/rstudio-connect/kubernetes-howto/appendices/content_images.html#custom-images. However, it would be nice to make GPU support easier for everyone.

bjfletcher commented 1 year ago

I now have GPU working in our Posit Connect setup. I can confirm that the base content image does not have the CUDA driver and therefore we'd need the custom image (unless we resolve the issues above kindly explained by Mike).