rocker-org / rocker-versioned2

Run current & prior versions of R using docker. rocker/r-ver, rocker/rstudio, rocker/shiny, rocker/tidyverse, and so on.
https://rocker-project.org
GNU General Public License v2.0
419 stars 180 forks source link

Nvidia devel tools get removed from rocker/cuda-based images #736

Open javdg opened 11 months ago

javdg commented 11 months ago

Container image name

rocker/cuda

Container image digest

No response

What operating system related to this question?

Linux

System information

No response

Question

As they are based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 (dockerfiles/cuda_4.3.0.Dockerfile), I was expecting to have the full CUDA Toolkit devel environment available when using any of the rocker/cuda and derived images.
However, https://github.com/rocker-org/rocker-versioned2/blob/de8b815b1b23c368308cc9dc960cb8a7c724be9f/scripts/install_R_source.sh#L160C39 seems to remove relevant packages (although not explicitly specified). This includes cuda-compiler-11-8*, cuda-minimal-build-11-8*, cuda-nvcc-11-8* and marks others for subsequent autoremoval (cuda-cuxxfilt-11-8, cuda-nvprune-11-8).
This does e.g. prevent the torch package to be easily installed with GPU-support within such containers.
Is this done intentionally?

eitsupi commented 11 months ago

@cboettig Any thoughts?

benz0li commented 11 months ago

Interesting. This does not happen with b-data's/my CUDA-enabled JupyterLab docker stacks which are also based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04.

I use

apt-get -y purge PACKAGES
apt-get -y autoremove
rm -rf /var/lib/apt/lists/*

instead of

apt-get remove --purge -y PACKAGES
apt-get autoremove -y
apt-get autoclean -y
rm -rf /var/lib/apt/lists/*

👉 Yes... somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y removes relevant CUDA packages from the Rocker images.


Another difference: b-data's/my images copy R from image glcr.b-data.ch/r/rsi instead of building R while building the docker image.
ℹ️ Python is "installed" the same way (i.e. copied from image glcr.b-data.ch/python/psi), if (a current) PYTHON_VERSION is set.

eitsupi commented 11 months ago

@cboettig Could you take a look at this?

cboettig commented 11 months ago

@eitsupi thanks for the ping, yeah I'll take a look!

cboettig commented 11 months ago

Yup,

somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y removes relevant CUDA packages from the Rocker images.

it looks like some of the build deps are (unsurprisingly) also build deps of cuda-devel. Still seems a bit puzzling to me that it would grab some of the nvidia tools.

Removing build-deps this way in rocker/r-ver recipe is relatively dated strategy -- I believe multi-staged builds are the standard way to build images without including development dependencies. (Though that mechanism didn't exist when this build recipe was initially deployed in rocker!)

@eitsupi I'm not quite sure how best to go about setting up a staged build dockerfile in the current build system though -- thoughts on how to go about that?

Perhaps a simpler / short-term solution would be to add a build arg to suppress the build_deps removal and set that argument in the ml stack.... (given the size of the cuda libs the R build deps are mostly already included or won't add much more to the image size I think).

eitsupi commented 11 months ago

I don't think there's a problem with doing a multi-stage build because we just change the description in the Dockerfile. It's just that I don't know the caching strategy when doing multi-stage builds. (All I know is that the inline caching we are currently doing is meaningless for multi-stage builds because it only caches the final image.)

I don't know how R on CUDA is built, but is it enough to copy R from rocker/r-ver against the cuda base image? It seems pretty cumbersome to rewrite rocker/r-ver to build with a multi-stage build. (At least I don't have the passion to do it)

cboettig commented 11 months ago

@eitsupi nice, I think that's a good idea -- let's leave rocker/r-ver as is, but let's adjust the rocker-cuda recipe.

Hmm, we can copy R_HOME from rocker/r-ver instead of running install_R_source.sh, though we'll still need to install system runtime dependencies. And then there's the linking done by make install (which is no longer available since it gets cleaned up), e.g. the linking of binaries in /usr/local/bin...

cboettig commented 11 months ago

Okay, I'm thinking something like this as the replacement rocker/cuda Dockerfile template:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
      org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
      org.opencontainers.image.vendor="Rocker Project" \
      org.opencontainers.image.authors="Carl Boettiger <cboettig@ropensci.org>"

ENV R_VERSION=4.3.2
ENV R_HOME=/usr/local/lib/R
ENV TZ=Etc/UTC
ENV NVBLAS_CONFIG_FILE=/etc/nvblas.conf
ENV PYTHON_VENV=/opt/venv
ENV PATH=${PYTHON_VENV}/bin:${R_HOME}/bin:${CUDA_HOME}/bin:${PATH}
ENV CRAN=https://p3m.dev/cran/__linux__/jammy/latest
ENV LANG=en_US.UTF-8

COPY --from=rocker/r-ver ${R_HOME} ${R_HOME}
COPY scripts /rocker_scripts

RUN /rocker_scripts/install_R_deps.sh
RUN /rocker_scripts/setup_R.sh
RUN /rocker_scripts/config_R_cuda.sh
RUN /rocker_scripts/install_python.sh

CMD ["R"]

This introduces install_R_deps.sh to the scripts directory, which is basically the runtime dependencies from install_R_source.sh and a tiny bit of config from there too. I think maybe the most notable change here is I put R_HOME/bin on the PATH, whereas make install links to /usr/local/bin. Not sure if there's anything else make install does that needs to be implemented here.

I noticed that setup_R.sh also removes some build deps that are compression libraries, maybe we should simply not do that there?

eitsupi commented 10 months ago

@cboettig Did you see the comment https://github.com/rocker-org/rocker-versioned2/issues/736#issuecomment-1847186381? We should use apt-get remove --purge instead of apt-get -y purge?

cboettig commented 10 months ago

@eitsupi I think you mean the reverse, that we should use apt-get -y purge ? that's interesting, I didn't test. I'm trying to find some documentation that apt-get purge and apt-get remove --purge should function differently? Maybe @eddelbuettel knows?

I always thought that remove meant: remove the binaries, but leave configuration files, data files, and dependencies, and that purge meant remove binaries + all that stuff.

@benz0li any chance you meant that you use apt-get remove without purge? I could understand how that would avoid the issue, but that would, to my understanding, leave all dependencies of our BUILDDEPS installed, but our BUILDDEPS list pulls in quite a number of additional dependencies in the process, and I think in general we do want to clean all those up.

eddelbuettel commented 10 months ago

Maybe @eddelbuettel knows?

When I read over the come apt purge vs apt remove --purge moments ago and mostly just smiled, shaking my head because after thirty years with Debian I still do not know the difference between apt upgrade and apt dist-upgrade.

For purge vs removal my mental model is that the latter removes the package files but leaves configuration and the former also nukes ("purges") the configuration files for a package. The difference may not matter much on containers as opposed to machines with actual reinstallations of packages. But YMMV and grains of salt and everything...

benz0li commented 10 months ago

@benz0li any chance you meant that you use apt-get remove without purge?

No. See https://github.com/b-data/jupyterlab-r-docker-stack/blob/e41ce09e241a060e5d4f9e558121b925007f52cc/base/latest.Dockerfile#L299-L301 for example.

I'm trying to find some documentation that apt-get purge and apt-get remove --purge should function differently?

@cboettig According to the manual page: remove --purge is equivalent to the purge command.


In my images linux packages do not get removed, because R is built in a separate image and then copied [from /usr/local] to [/usr/local of] an image that has only the runtime dependencies installed.

javdg commented 10 months ago

Thanks for looking into this everyone!

I did some testing regarding the apt-get remove --purge vs. apt-get purge point raised above:

I took nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 and ran the (supposedly) relevant commands (with slight modifications where needed) as per scripts/install_R_source.sh:

apt-get update (l. 18)

apt-get install -y --no-install-recommends bash-completion ca-certificates file fonts-texgyre g++ gfortran gsfonts libblas-dev libbz2-* libcurl4 "libicu[0-9][0-9]" liblapack-dev libpcre2* libjpeg-turbo* libpangocairo-* libpng16* libreadline8 libtiff* liblzma* make tzdata unzip zip zlib1g

(cf. l. 35)

export BUILDDEPS="curl \
    default-jdk \
    devscripts \
    libbz2-dev \
    libcairo2-dev \
    libcurl4-openssl-dev \
    libpango1.0-dev \
    libjpeg-dev \
    libicu-dev \
    libpcre2-dev \
    libpng-dev \
    libreadline-dev \
    libtiff5-dev \
    liblzma-dev \
    libx11-dev \
    libxt-dev \
    perl \
    rsync \
    subversion \
    tcl-dev \
    tk-dev \
    texinfo \
    texlive-extra-utils \
    texlive-fonts-recommended \
    texlive-fonts-extra \
    texlive-latex-recommended \
    texlive-latex-extra \
    x11proto-core-dev \
    xauth \
    xfonts-base \
    xvfb \
    wget \
    zlib1g-dev"

(cf. l. 61)

apt-get install -y --no-install-recommends ${BUILDDEPS} (l. 96)

In different containers, I then ran apt-get remove --purge ${BUILDDEPS} and apt-get purge ${BUILDDEPS}

This lead to identical results with the following list of packages to be removed/marked for removal (note that this does, amongst others, include build-essentialand various cuda-*-packages not originally specified in BUILDDEPS):

``` The following packages were automatically installed and are no longer required: bzip2 ca-certificates-java cuda-cuxxfilt-11-8 cuda-nvprune-11-8 default-jdk-headless default-jre default-jre-headless fakeroot fonts-lmodern gir1.2-freedesktop gir1.2-glib-2.0 gir1.2-harfbuzz-0.0 gir1.2-pango-1.0 icu-devtools java-common libapache-pom-java libapr1 libaprutil1 libasound2 libasound2-data libavahi-client3 libavahi-common-data libavahi-common3 libblkid-dev libbrotli-dev libcairo-gobject2 libcairo-script-interpreter2 libcommons-logging-java libcommons-parent-java libcups2 libdbus-1-3 libdeflate-dev libexpat1-dev libfakeroot libffi-dev libfindlib-ocaml libfontbox-java libfontenc1 libfribidi-dev libgdbm-compat4 libgdbm6 libgif7 libgirepository-1.0-1 libglib2.0-bin libglib2.0-dev-bin libgraphite2-dev libharfbuzz-gobject0 libharfbuzz-icu0 libice-dev libice6 libjbig-dev libjpeg8-dev libjs-jquery libkpathsea6 liblcms2-2 liblzo2-2 libmpdec3 libncurses-dev libncurses5-dev libnspr4 libnss3 libpangoxft-1.0-0 libpaper-utils libpaper1 libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libpcsclite1 libpdfbox-java libperl5.34 libpixman-1-dev libpopt0 libptexenc1 libpthread-stubs0-dev libpython3-stdlib libpython3.10-minimal libpython3.10-stdlib libsepol-dev libserf-1-1 libsm-dev libsm6 libsombok3 libsvn1 libsynctex2 libtcl8.6 libteckit0 libtexlua53 libtexluajit2 libtk8.6 libunwind8 libutf8proc2 libxau-dev libxaw7 libxcb-render0-dev libxcb-shm0-dev libxcb1-dev libxdmcp-dev libxfont2 libxft2 libxkbfile1 libxmu6 libxmuu1 libxpm4 libxss1 libxt6 libxtst6 libzzip-0-13 lmodern lto-disabled-list media-types netbase ocaml ocaml-compiler-libs ocaml-findlib ocaml-interp openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless pango1.0-tools patch perl-modules-5.34 perl-openssl-defaults preview-latex-style python3 python3-distutils python3-lib2to3 python3-minimal python3.10 python3.10-minimal t1utils tcl tcl8.6 tex-common tk tk8.6 uuid-dev wdiff x11-common x11-xkb-utils x11proto-dev xdg-utils xfonts-encodings xfonts-utils xkb-data xorg-sgml-doctools xserver-common xtrans-dev xz-utils Use 'apt autoremove' to remove them. The following packages will be REMOVED: build-essential* cuda-compiler-11-8* cuda-minimal-build-11-8* cuda-nvcc-11-8* curl* default-jdk* devscripts* dpkg-dev* libb-hooks-op-check-perl* libbz2-dev* libbz2-ocaml-dev* libcairo2-dev* libclass-method-modifiers-perl* libclass-xsaccessor-perl* libcurl4-openssl-dev* libdatrie-dev* libdevel-callchecker-perl* libdpkg-perl* libdynaloader-functions-perl* libencode-locale-perl* libfile-dirlist-perl* libfile-homedir-perl* libfile-listing-perl* libfile-touch-perl* libfile-which-perl* libfontconfig-dev* libfontconfig1-dev* libfreetype-dev* libfreetype6-dev* libglib2.0-dev* libharfbuzz-dev* libhtml-parser-perl* libhtml-tagset-perl* libhtml-tree-perl* libhttp-cookies-perl* libhttp-date-perl* libhttp-message-perl* libhttp-negotiate-perl* libicu-dev* libimport-into-perl* libio-html-perl* libio-pty-perl* libio-socket-ssl-perl* libipc-run-perl* libjpeg-dev* liblwp-mediatypes-perl* liblwp-protocol-https-perl* liblzma-dev* libmime-charset-perl* libmodule-runtime-perl* libmoo-perl* libmount-dev* libnet-http-perl* libnet-ssleay-perl* libpango1.0-dev* libparams-classify-perl* libpcre2-dev* libpng-dev* libreadline-dev* librole-tiny-perl* libselinux1-dev* libsub-quote-perl* libtext-unidecode-perl* libthai-dev* libtiff-dev* libtiff5-dev* libtimedate-perl* libtry-tiny-perl* libunicode-linebreak-perl* liburi-perl* libwww-perl* libwww-robotrules-perl* libx11-dev* libxext-dev* libxft-dev* libxml-libxml-perl* libxml-namespacesupport-perl* libxml-sax-base-perl* libxml-sax-perl* libxrender-dev* libxss-dev* libxt-dev* patchutils* perl* pkg-config* rsync* subversion* tcl-dev* tcl8.6-dev* texinfo* texlive-base* texlive-binaries* texlive-extra-utils* texlive-fonts-extra* texlive-fonts-recommended* texlive-latex-base* texlive-latex-extra* texlive-latex-recommended* texlive-luatex* texlive-pictures* texlive-plain-generic* tk-dev* tk8.6-dev* wget* x11proto-core-dev* xauth* xfonts-base* xvfb* zlib1g-dev* ```

According to https://www.mankier.com/8/apt-get#--purge, remove [--purge] is equivalent to the purge command. so I am not surprised to see no difference there.

Additionally, apt-get remove ${BUILDDEPS} leads to the same list of packages, but without the various trailing *, which (again https://www.mankier.com/8/apt-get#--purge) will be displayed next to packages which are scheduled to be purged. Furthermore https://www.mankier.com/8/apt-get#Description-purge does in this respect confirm @cboettig's understanding of remove vs. purge.

Judging from this I would say this is not about subtle differences in command syntax (they seem to be identical/working as expected/documented), but a rather curious case of Debian/Ubuntu dependency management, where a collection of packages to be installed will pull in additional dependencies and/or create "reverse dependencies" which, once uninstalling the original set of packages, do proceed to rip out other parts of the system...

cboettig commented 10 months ago

Thanks all, details super appreciated. Working on fix for this in recent PRs. A multi-stage build is probably the natural thing but a non-trivial shift, for the moment I think we'll simply leave the builddeps in place on the cuda stack (that nvidia base image is so large to begin with anyway)