Open javdg opened 11 months ago
@cboettig Any thoughts?
Interesting. This does not happen with b-data's/my CUDA-enabled JupyterLab docker stacks which are also based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
.
I use
apt-get -y purge PACKAGES
apt-get -y autoremove
rm -rf /var/lib/apt/lists/*
instead of
apt-get remove --purge -y PACKAGES
apt-get autoremove -y
apt-get autoclean -y
rm -rf /var/lib/apt/lists/*
👉 Yes... somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y
removes relevant CUDA packages from the Rocker images.
Another difference: b-data's/my images copy R from image glcr.b-data.ch/r/rsi
instead of building R while building the docker image.
ℹ️ Python is "installed" the same way (i.e. copied from image glcr.b-data.ch/python/psi
), if (a current) PYTHON_VERSION
is set.
@cboettig Could you take a look at this?
@eitsupi thanks for the ping, yeah I'll take a look!
Yup,
somehow apt-get remove --purge -y ${BUILDDEPS}; apt-get autoremove -y removes relevant CUDA packages from the Rocker images.
it looks like some of the build deps are (unsurprisingly) also build deps of cuda-devel. Still seems a bit puzzling to me that it would grab some of the nvidia tools.
Removing build-deps this way in rocker/r-ver recipe is relatively dated strategy -- I believe multi-staged builds are the standard way to build images without including development dependencies. (Though that mechanism didn't exist when this build recipe was initially deployed in rocker!)
@eitsupi I'm not quite sure how best to go about setting up a staged build dockerfile in the current build system though -- thoughts on how to go about that?
Perhaps a simpler / short-term solution would be to add a build arg to suppress the build_deps removal and set that argument in the ml stack.... (given the size of the cuda libs the R build deps are mostly already included or won't add much more to the image size I think).
I don't think there's a problem with doing a multi-stage build because we just change the description in the Dockerfile. It's just that I don't know the caching strategy when doing multi-stage builds. (All I know is that the inline caching we are currently doing is meaningless for multi-stage builds because it only caches the final image.)
I don't know how R on CUDA is built, but is it enough to copy R from rocker/r-ver
against the cuda base image?
It seems pretty cumbersome to rewrite rocker/r-ver
to build with a multi-stage build.
(At least I don't have the passion to do it)
@eitsupi nice, I think that's a good idea -- let's leave rocker/r-ver
as is, but let's adjust the rocker-cuda recipe.
Hmm, we can copy R_HOME from rocker/r-ver
instead of running install_R_source.sh
, though we'll still need to install system runtime dependencies. And then there's the linking done by make install
(which is no longer available since it gets cleaned up), e.g. the linking of binaries in /usr/local/bin...
Okay, I'm thinking something like this as the replacement rocker/cuda Dockerfile template:
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
org.opencontainers.image.vendor="Rocker Project" \
org.opencontainers.image.authors="Carl Boettiger <cboettig@ropensci.org>"
ENV R_VERSION=4.3.2
ENV R_HOME=/usr/local/lib/R
ENV TZ=Etc/UTC
ENV NVBLAS_CONFIG_FILE=/etc/nvblas.conf
ENV PYTHON_VENV=/opt/venv
ENV PATH=${PYTHON_VENV}/bin:${R_HOME}/bin:${CUDA_HOME}/bin:${PATH}
ENV CRAN=https://p3m.dev/cran/__linux__/jammy/latest
ENV LANG=en_US.UTF-8
COPY --from=rocker/r-ver ${R_HOME} ${R_HOME}
COPY scripts /rocker_scripts
RUN /rocker_scripts/install_R_deps.sh
RUN /rocker_scripts/setup_R.sh
RUN /rocker_scripts/config_R_cuda.sh
RUN /rocker_scripts/install_python.sh
CMD ["R"]
This introduces install_R_deps.sh
to the scripts directory, which is basically the runtime dependencies from install_R_source.sh
and a tiny bit of config from there too. I think maybe the most notable change here is I put R_HOME/bin on the PATH, whereas make install links to /usr/local/bin. Not sure if there's anything else make install does that needs to be implemented here.
I noticed that setup_R.sh also removes some build deps that are compression libraries, maybe we should simply not do that there?
@cboettig Did you see the comment https://github.com/rocker-org/rocker-versioned2/issues/736#issuecomment-1847186381?
We should use apt-get remove --purge
instead of apt-get -y purge
?
@eitsupi I think you mean the reverse, that we should use apt-get -y purge
? that's interesting, I didn't test. I'm trying to find some documentation that apt-get purge
and apt-get remove --purge
should function differently? Maybe @eddelbuettel knows?
I always thought that remove
meant: remove the binaries, but leave configuration files, data files, and dependencies, and that purge
meant remove binaries + all that stuff.
@benz0li any chance you meant that you use apt-get remove
without purge? I could understand how that would avoid the issue, but that would, to my understanding, leave all dependencies of our BUILDDEPS installed, but our BUILDDEPS list pulls in quite a number of additional dependencies in the process, and I think in general we do want to clean all those up.
Maybe @eddelbuettel knows?
When I read over the come apt purge
vs apt remove --purge
moments ago and mostly just smiled, shaking my head because after thirty years with Debian I still do not know the difference between apt upgrade
and apt dist-upgrade
.
For purge
vs removal
my mental model is that the latter removes the package files but leaves configuration and the former also nukes ("purges") the configuration files for a package. The difference may not matter much on containers as opposed to machines with actual reinstallations of packages. But YMMV and grains of salt and everything...
@benz0li any chance you meant that you use
apt-get remove
without purge?
No. See https://github.com/b-data/jupyterlab-r-docker-stack/blob/e41ce09e241a060e5d4f9e558121b925007f52cc/base/latest.Dockerfile#L299-L301 for example.
I'm trying to find some documentation that
apt-get purge
andapt-get remove --purge
should function differently?
@cboettig According to the manual page: remove --purge is equivalent to the purge command.
In my images linux packages do not get removed, because R is built in a separate image and then copied [from /usr/local
] to [/usr/local
of] an image that has only the runtime dependencies installed.
Thanks for looking into this everyone!
I did some testing regarding the apt-get remove --purge
vs. apt-get purge
point raised above:
I took nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
and ran the (supposedly) relevant commands (with slight modifications where needed) as per scripts/install_R_source.sh
:
apt-get update
(l. 18)
apt-get install -y --no-install-recommends bash-completion ca-certificates file fonts-texgyre g++ gfortran gsfonts libblas-dev libbz2-* libcurl4 "libicu[0-9][0-9]" liblapack-dev libpcre2* libjpeg-turbo* libpangocairo-* libpng16* libreadline8 libtiff* liblzma* make tzdata unzip zip zlib1g
(cf. l. 35)
export BUILDDEPS="curl \
default-jdk \
devscripts \
libbz2-dev \
libcairo2-dev \
libcurl4-openssl-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre2-dev \
libpng-dev \
libreadline-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
rsync \
subversion \
tcl-dev \
tk-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
texlive-latex-extra \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb \
wget \
zlib1g-dev"
(cf. l. 61)
apt-get install -y --no-install-recommends ${BUILDDEPS}
(l. 96)
In different containers, I then ran apt-get remove --purge ${BUILDDEPS}
and apt-get purge ${BUILDDEPS}
This lead to identical results with the following list of packages to be removed/marked for removal (note that this does, amongst others, include build-essential
and various cuda-*
-packages not originally specified in BUILDDEPS
):
According to https://www.mankier.com/8/apt-get#--purge, remove [--purge] is equivalent to the purge command.
so I am not surprised to see no difference there.
Additionally, apt-get remove ${BUILDDEPS}
leads to the same list of packages, but without the various trailing *, which (again https://www.mankier.com/8/apt-get#--purge) will be displayed next to packages which are scheduled to be purged.
Furthermore https://www.mankier.com/8/apt-get#Description-purge does in this respect confirm @cboettig's understanding of remove
vs. purge
.
Judging from this I would say this is not about subtle differences in command syntax (they seem to be identical/working as expected/documented), but a rather curious case of Debian/Ubuntu dependency management, where a collection of packages to be installed will pull in additional dependencies and/or create "reverse dependencies" which, once uninstalling the original set of packages, do proceed to rip out other parts of the system...
Thanks all, details super appreciated. Working on fix for this in recent PRs. A multi-stage build is probably the natural thing but a non-trivial shift, for the moment I think we'll simply leave the builddeps in place on the cuda
stack (that nvidia base image is so large to begin with anyway)
Container image name
rocker/cuda
Container image digest
No response
What operating system related to this question?
Linux
System information
No response
Question
As they are based on nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 (dockerfiles/cuda_4.3.0.Dockerfile), I was expecting to have the full CUDA Toolkit devel environment available when using any of the rocker/cuda and derived images.
However, https://github.com/rocker-org/rocker-versioned2/blob/de8b815b1b23c368308cc9dc960cb8a7c724be9f/scripts/install_R_source.sh#L160C39 seems to remove relevant packages (although not explicitly specified). This includes
cuda-compiler-11-8*, cuda-minimal-build-11-8*, cuda-nvcc-11-8*
and marks others for subsequent autoremoval (cuda-cuxxfilt-11-8, cuda-nvprune-11-8
).This does e.g. prevent the torch package to be easily installed with GPU-support within such containers.
Is this done intentionally?