os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
21 stars 10 forks source link

Create notebook image with pachyderm plugin #161

Open erikerlandson opened 2 years ago

erikerlandson commented 2 years ago

https://docs.pachyderm.com/latest/how-tos/jupyterlab-extension/#adding-the-extension-to-your-jupyterhub-deployment-with-helm

As usual, there seems to be a disheartening amount of "assume ability to run container as root user" in the instructions

erikerlandson commented 2 years ago

cc @HumairAK @caldeirav

erikerlandson commented 2 years ago

https://docs.pachyderm.com/latest/how-tos/jupyterlab-extension/#pre-built-image-vs-make-your-own

# This runs the following section as root; if adding to an existing Dockerfile, set the user back to whatever you need. 
USER root

# This is the directory files will be mounted to, mirroring how pipelines are run. 
RUN mkdir -p /pfs 

# If you are not using "jovyan" as your notebook user, replace the user here. 
RUN chown $NB_USER /pfs

# Fuse is a requirement for the mount extension 
RUN apt-get clean && RUN apt-get update && apt-get -y install curl fuse 

# Install Pachctl - Set the version of Pachctl that matches your cluster deployment. 
RUN curl -f -o pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v${PACHCTL_VERSION}/pachctl_${PACHCTL_VERSION}_amd64.deb 
RUN dpkg -i pachctl.deb

# This sets the user back to the notebook user account (i.e., Jovyan) 
USER $NB_UID

# Replace the version here with the version of the extension you would like to install from https://pypi.org/project/jupyterlab-pachyderm/ 
RUN pip install jupyterlab-pachyderm==<version> 
erikerlandson commented 2 years ago

https://github.com/os-climate/pachyderm-notebook-image/blob/main/images/pachyderm-notebook/Containerfile

erikerlandson commented 2 years ago

Currently there is an issue with fuse requiring container to run as root: https://github.com/pachyderm/pachyderm/issues/7508

MichaelTiemannOSC commented 2 years ago

Is this issue now satisfied by https://github.com/os-climate/wri-gppd-ingestion-pipeline/blob/master/notebooks/wri-gppd-02-loading.ipynb ?

caldeirav commented 2 years ago

No but the plugin is not a big priority for us right now since we are already using the primary function of pachyderm through the Python APIs anyway.