nmfs-opensci / py-rocket-base

base jupyterhub image with Python and R. See packages for the images.
https://github.com/nmfs-opensci/py-rocket-base/pkgs/container/py-rocket-base
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

test installing R and RStudio with the scripts #25

Closed eeholmes closed 1 week ago

eeholmes commented 1 week ago

The install_r script fails in the make part (which is part of the R source).

Created a dev branch where I am root in the docker image so I can test the script and figure out what is going on.

Update: Setting path temporarily fixes the problem. Here the PATH var is only for the RUN cmds.

  RUN PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && \
    mkdir /rocker_scripts && \
    cp ${REPO_DIR}/scripts/install_R_source.sh /rocker_scripts/install_R_source.sh && \
    chmod +x /rocker_scripts/install_R_source.sh && \
    cd / && \
    /rocker_scripts/install_R_source.sh
eeholmes commented 1 week ago

Debug notes. Trying to get this to run

Start Codespace on py-rocket-2

docker run -d -p 8888:8888 --name test ghcr.io/nmfs-opensci/py-rocket-2:8e17b0058367
# run as root bash
docker exec -u root -it test /bin/bash

Then install stuff

cd /
source ${REPO_DIR}/rocker.sh # get env vars
# Read the Dockerfile and process each line
while IFS= read -r line; do
    # Check if the line starts with ENV or RUN
    if [[ "$line" == ENV* ]]; then
        # Export the environment variable from the ENV line
        eval $(echo "$line" | sed 's/^ENV //g')
    elif [[ "$line" == RUN* ]]; then
        # Run the command from the RUN line
        cmd=$(echo "$line" | sed 's/^RUN //g') || echo ${cmd}" encountered error. Continuing"
        echo "Executing: $cmd"
        eval "$cmd"
    fi
done < /rocker_scripts/r-ver_4.4.1.Dockerfile
eeholmes commented 1 week ago

Where I am at.

So what is different with the repo2docker image?

eeholmes commented 1 week ago

Here is the Dockerfile that repo2docker is making

FROM docker.io/library/buildpack-deps:jammy
  # Avoid prompts from apt
  ENV DEBIAN_FRONTEND=noninteractive
  # Set up locales properly
  RUN apt-get -qq update && \
      apt-get -qq install --yes --no-install-recommends locales > /dev/null && \
      apt-get -qq purge && \
      apt-get -qq clean && \
      rm -rf /var/lib/apt/lists/*
  RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
      locale-gen
  ENV LC_ALL=en_US.UTF-8 \
      LANG=en_US.UTF-8 \
      LANGUAGE=en_US.UTF-8
  # Use bash as default shell, rather than sh
  ENV SHELL=/bin/bash
  # Set up user
  ARG NB_USER
  ARG NB_UID
  ENV USER=${NB_USER} \
      HOME=/home/${NB_USER}
  RUN groupadd \
          --gid ${NB_UID} \
          ${NB_USER} && \
      useradd \
          --comment "Default user" \
          --create-home \
          --gid ${NB_UID} \
          --no-log-init \
          --shell /bin/bash \
          --uid ${NB_UID} \
          ${NB_USER}
  # Base package installs are not super interesting to users, so hide their outputs
  # If install fails for some reason, errors will still be printed
  RUN apt-get -qq update && \
      apt-get -qq install --yes --no-install-recommends \
         gettext-base \
         less \
         unzip \
         > /dev/null && \
      apt-get -qq purge && \
      apt-get -qq clean && \
      rm -rf /var/lib/apt/lists/*
  EXPOSE 8888
  # Environment variables required for build
  ENV APP_BASE=/srv
  ENV CONDA_DIR=${APP_BASE}/conda
  ENV NB_PYTHON_PREFIX=${CONDA_DIR}/envs/notebook
  ENV NPM_DIR=${APP_BASE}/npm
  ENV NPM_CONFIG_GLOBALCONFIG=${NPM_DIR}/npmrc
  ENV NB_ENVIRONMENT_FILE=/tmp/env/environment.lock
  ENV MAMBA_ROOT_PREFIX=${CONDA_DIR}
  ENV MAMBA_EXE=${CONDA_DIR}/bin/mamba
  ENV CONDA_PLATFORM=linux-64
  ENV KERNEL_PYTHON_PREFIX=${NB_PYTHON_PREFIX}
  # Special case PATH
  ENV PATH=${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${NPM_DIR}/bin:${PATH}
  # If scripts required during build are present, copy them
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2factivate-2dconda-2esh-e70a7b /etc/profile.d/activate-conda.sh
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2fenvironment-2epy-2d3-2e10-2dlinux-2d64-2elock-8fa955 /tmp/env/environment.lock
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2finstall-2dbase-2denv-2ebash-6a6072 /tmp/install-base-env.bash
  RUN TIMEFORMAT='time: %3R' \
  bash -c 'time /tmp/install-base-env.bash' && \
  rm -rf /tmp/install-base-env.bash /tmp/env
  RUN mkdir -p ${NPM_DIR} && \
  chown -R ${NB_USER}:${NB_USER} ${NPM_DIR}
  # ensure root user after build scripts
  USER root
  # Allow target path repo is cloned to be configurable
  ARG REPO_DIR=${HOME}
  ENV REPO_DIR=${REPO_DIR}
  # Create a folder and grant the user permissions if it doesn't exist
  RUN if [ ! -d "${REPO_DIR}" ]; then \
          /usr/bin/install -o ${NB_USER} -g ${NB_USER} -d "${REPO_DIR}"; \
      fi
  WORKDIR ${REPO_DIR}
  RUN chown ${NB_USER}:${NB_USER} ${REPO_DIR}
  # We want to allow two things:
  #   1. If there's a .local/bin directory in the repo, things there
  #      should automatically be in path
  #   2. postBuild and users should be able to install things into ~/.local/bin
  #      and have them be automatically in path
  #
  # The XDG standard suggests ~/.local/bin as the path for local user-specific
  # installs. See https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
  ENV PATH=${HOME}/.local/bin:${REPO_DIR}/.local/bin:${PATH}
  # The rest of the environment
  ENV CONDA_DEFAULT_ENV=${KERNEL_PYTHON_PREFIX}
  # Run pre-assemble scripts! These are instructions that depend on the content
  # of the repository but don't access any files in the repository. By executing
  # them before copying the repository itself we can cache these steps. For
  # example installing APT packages.
  # ensure root user after preassemble scripts
  USER root
  # Copy stuff.
  COPY --chown=1000:1000 src/ ${REPO_DIR}/
  # Run assemble scripts! These will actually turn the specification
  # in the repository into an image.
  # Container image Labels!
  # Put these at the end, since we don't want to rebuild everything
  # when these change! Did I mention I hate Dockerfile cache semantics?
  LABEL repo2docker.ref="refs/heads/dev2"
  LABEL repo2docker.repo="https://github.com/nmfs-opensci/py-rocket-2"
  LABEL repo2docker.version="[202](https://github.com/nmfs-opensci/py-rocket-2/actions/runs/11257663103/job/31302445744#step:5:205)4.07.0+28.g239c4f5"
  # We always want containers to run as non-root
  USER ${NB_USER}
  # Make sure that postBuild scripts are marked executable before executing them
  RUN chmod +x postBuild
  RUN ./postBuild
  # Add start script
  RUN chmod +x "${REPO_DIR}/start"
  ENV R2D_ENTRYPOINT="${REPO_DIR}/start"
  # Add entrypoint
  ENV PYTHONUNBUFFERED=1
  COPY /python3-login /usr/local/bin/python3-login
  COPY /repo2docker-entrypoint /usr/local/bin/repo2docker-entrypoint
  ENTRYPOINT ["/usr/local/bin/repo2docker-entrypoint"]
  # Specify the default command to run
  CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
  # Appendix:
  # Re-enable man pages disabled in Ubuntu 18 minimal image
  # https://wiki.ubuntu.com/Minimal
  USER root
  ENV R_VERSION="4.4.1"
  ENV R_HOME="/usr/local/lib/R"
  ENV TZ="Etc/UTC"
  RUN mkdir /rocker_scripts && \
    cp ${REPO_DIR}/install_R_source.sh /rocker_scripts/install_R_source.sh && \
    chmod +x /rocker_scripts/install_R_source.sh && \
    cd / && \
    /rocker_scripts/install_R_source.sh
  # Revert to default user
  USER ${NB_USER}
eeholmes commented 1 week ago

This runs. This only biggish thing removed

  # If scripts required during build are present, copy them
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2factivate-2dconda-2esh-e70a7b /etc/profile.d/activate-conda.sh
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2fenvironment-2epy-2d3-2e10-2dlinux-2d64-2elock-8fa955 /tmp/env/environment.lock
  COPY --chown=1000:1000 build_script_files/-2fopt-2fvenv-2flib-2fpython3-2e11-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2finstall-2dbase-2denv-2ebash-6a6072 /tmp/install-base-env.bash
  RUN TIMEFORMAT='time: %3R' \
  bash -c 'time /tmp/install-base-env.bash' && \
  rm -rf /tmp/install-base-env.bash /tmp/env
FROM docker.io/library/buildpack-deps:jammy
  # Avoid prompts from apt
  ENV DEBIAN_FRONTEND=noninteractive
  # Set up locales properly
  RUN apt-get -qq update && \
      apt-get -qq install --yes --no-install-recommends locales > /dev/null && \
      apt-get -qq purge && \
      apt-get -qq clean && \
      rm -rf /var/lib/apt/lists/*
  RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
      locale-gen
  ENV LC_ALL=en_US.UTF-8 \
      LANG=en_US.UTF-8 \
      LANGUAGE=en_US.UTF-8
  # Use bash as default shell, rather than sh
  ENV SHELL=/bin/bash
  # Set up user
  ENV NB_USER="jovyan"
  ENV NB_UID=1000
  ENV USER=${NB_USER} \
      HOME=/home/${NB_USER}
  RUN groupadd \
          --gid ${NB_UID} \
          ${NB_USER} && \
      useradd \
          --comment "Default user" \
          --create-home \
          --gid ${NB_UID} \
          --no-log-init \
          --shell /bin/bash \
          --uid ${NB_UID} \
          ${NB_USER}
  # Base package installs are not super interesting to users, so hide their outputs
  # If install fails for some reason, errors will still be printed
  RUN apt-get -qq update && \
      apt-get -qq install --yes --no-install-recommends \
         gettext-base \
         less \
         unzip \
         > /dev/null && \
      apt-get -qq purge && \
      apt-get -qq clean && \
      rm -rf /var/lib/apt/lists/*
  EXPOSE 8888
  # Environment variables required for build
  ENV APP_BASE=/srv
  ENV CONDA_DIR=${APP_BASE}/conda
  ENV NB_PYTHON_PREFIX=${CONDA_DIR}/envs/notebook
  ENV NPM_DIR=${APP_BASE}/npm
  ENV NPM_CONFIG_GLOBALCONFIG=${NPM_DIR}/npmrc
  ENV NB_ENVIRONMENT_FILE=/tmp/env/environment.lock
  ENV MAMBA_ROOT_PREFIX=${CONDA_DIR}
  ENV MAMBA_EXE=${CONDA_DIR}/bin/mamba
  ENV CONDA_PLATFORM=linux-64
  ENV KERNEL_PYTHON_PREFIX=${NB_PYTHON_PREFIX}
  # Special case PATH
  ENV PATH=${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${NPM_DIR}/bin:${PATH}
  RUN mkdir -p ${NPM_DIR} && \
  chown -R ${NB_USER}:${NB_USER} ${NPM_DIR}
  # ensure root user after build scripts
  USER root
  ENV REPO_DIR="/srv/repo"
  # Create a folder and grant the user permissions if it doesn't exist
  RUN if [ ! -d "${REPO_DIR}" ]; then \
          /usr/bin/install -o ${NB_USER} -g ${NB_USER} -d "${REPO_DIR}"; \
      fi
  WORKDIR ${REPO_DIR}
  RUN chown ${NB_USER}:${NB_USER} ${REPO_DIR}
  # We want to allow two things:
  #   1. If there's a .local/bin directory in the repo, things there
  #      should automatically be in path
  #   2. postBuild and users should be able to install things into ~/.local/bin
  #      and have them be automatically in path
  #
  # The XDG standard suggests ~/.local/bin as the path for local user-specific
  # installs. See https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
  ENV PATH=${HOME}/.local/bin:${REPO_DIR}/.local/bin:${PATH}
  # The rest of the environment
  ENV CONDA_DEFAULT_ENV=${KERNEL_PYTHON_PREFIX}
  # Run pre-assemble scripts! These are instructions that depend on the content
  # of the repository but don't access any files in the repository. By executing
  # them before copying the repository itself we can cache these steps. For
  # example installing APT packages.
  # ensure root user after preassemble scripts
  USER root
  # Copy stuff.
  COPY --chown=1000:1000 . ${REPO_DIR}/
  # Run assemble scripts! These will actually turn the specification
  # in the repository into an image.
  # Container image Labels!
  # Put these at the end, since we don't want to rebuild everything
  # when these change! Did I mention I hate Dockerfile cache semantics?
  LABEL repo2docker.ref="refs/heads/dev2"
  LABEL repo2docker.repo="https://github.com/nmfs-opensci/py-rocket-2"
  LABEL repo2docker.version="[202](https://github.com/nmfs-opensci/py-rocket-2/actions/runs/11257663103/job/31302445744#step:5:205)4.07.0+28.g239c4f5"
  # We always want containers to run as non-root
  USER ${NB_USER}
  # Add entrypoint
  ENV PYTHONUNBUFFERED=1
  # Specify the default command to run
  CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
  # Appendix:
  # Re-enable man pages disabled in Ubuntu 18 minimal image
  # https://wiki.ubuntu.com/Minimal
  USER root
  ENV R_VERSION="4.4.1"
  ENV R_HOME="/usr/local/lib/R"
  ENV TZ="Etc/UTC"
  RUN mkdir /rocker_scripts && \
    cp ${REPO_DIR}/scripts/install_R_source.sh /rocker_scripts/install_R_source.sh && \
    chmod +x /rocker_scripts/install_R_source.sh && \
    cd / && \
    /rocker_scripts/install_R_source.sh
  # Revert to default user
  USER ${NB_USER}
eeholmes commented 1 week ago

ok I was able to get myself out of libcurl header hell with

 ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
 ENV PKG_CONFIG_PATH=/usr/lib/x86_64-linux-gnu/pkgconfig:$PKG_CONFIG_PATH
 ENV CURL_CONFIG=/usr/bin/curl-config

but this basic problem is the conda bins stuck in the path. Those ENV vars are telling it to ignore all the conda stuff.

/srv/conda/condabin:/home/codespace/.local/bin:/home/codespace/.local/bin:/srv/conda/envs/notebook/bin:/srv/conda/bin:/srv/npm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:

Setting the PATH temporarily back to what it was before micromamba was installed works and seems easiest.

  RUN PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && \
    mkdir /rocker_scripts && \
    cp ${REPO_DIR}/scripts/install_R_source.sh /rocker_scripts/install_R_source.sh && \
    chmod +x /rocker_scripts/install_R_source.sh && \
    cd / && \
    /rocker_scripts/install_R_source.sh

Question is, it the conda stuff in the path in a repo2docker image generally going to cause a problem for R/RStudio.

eeholmes commented 1 week ago

Long journey but fixed #46 #45 #44 #44 #43 #42 #41 #40 #39 #38 and on and on

hard part was getting the ENV variables passed to the scripts.