Consider implementing volumes into profiles

vpavlin commented 4 years ago

I had a discussion with @guimou about potential feature for the profiles - adding a set of volumes mounted to the singleuser pod.

This would be useful for

a set of users want to have access to a shared volume (requires RWX PV(C)s)
a workshop requires some dataset/examples and instead of pulling it from S3, we want to have it mounted in the container
we want to provide additional storage to a set of users
specialized binaries/libraries mounted into the container

The above examples would require a JupyterHub admin to create the PVCs, add data and specify the references in profiles config

Another use case with slightly different implementation (kind of vice versa to above) might be to provide a feature to label a set of PVCs which would then be loaded and presented in the spawner UI (similarly to Images and Sizes at the moment) to make example datasets, workshop materials or generally shared storage available to users. e.g.:

jupyterhub.opendatahub.io/volume: public
jupyterhub.opendatahub.io/volume-access: ro / rw
jupyterhub.opendatahub.io/volume-group: opendatahub

The volume: public means it would be offered to all the users, volume-group: opendatahub means it would be offered to all the users in group opendatahub etc.

ETA: 3 weeks

guimou commented 4 years ago

I implemented it functionally for a PoC with a customer. That works great and they're really happy with it as it allows them to share data and notebooks. But the shared libraries is something we have to dig into I think as it allows to greatly enhance the environment capabilities without beefing up the images. I faced this issue with the client as they had R scripts requiring lots of fancy libraries installed. We cannot bake them all into an image, it's cumbersome to do several pip install or install.packages each time you start your environment. Plus it's super hard to make the images evolve. I may have a solution to dynamically load those environments based on the RWX approach. I'll make some tests and report here.

vpavlin commented 3 years ago

Example profile:

- name: globals
    env:
    - name: THIS_IS_GLOBAL
      value: "This will appear in all singleuser pods"
    resources:
      requests:
        memory: "500Mi"
        cpu: "250m"
      limits:
        memory: "1Gi"
        cpu: "500m"
    volumes:
    - name: dataset
      persistentVolumeClaim:
        claimName: example-dataset-pvc
      mountPath(1): /opt/app-root/src/example-dataset
      mountPaht(2): ./example-dataset (add prefix)
      (3) no-mountPath: /opt/app-root/src/$(name)

[ ] Replace hardcoded path in https://github.com/opendatahub-io/jupyterhub-odh/blob/master/.jupyter/jupyterhub_config.py with variable
[ ] Add argument to SingleuserProfiles init to pass default mount path
[ ] Enable 3 alternatives - absolute path (use the exact value), relative path (prefix the value with default mount path), no path (use name of the volume as a relative path)
[ ] Add volume and volumeMounts items to pod in apply_pod_profile

References:

guimou commented 3 years ago

In this profile approach, there is no notion anymore of r/w and groups? Also, as a note: this feature only works with RWX volumes, which may not be available on all OpenShift clusters depending on the storage implementation.

maroroman commented 3 years ago

So the example @vpavlin posted is the first iteration of the implementation, we first want to implement adding volumes to the Singleuser profiles configmap and apply it to the pod, which is dealt with in issue #80 After that we will be able to extend this feature and most likely follow the kubernetes structure for defining these volumes as it is in the references of the example so r/w and groups should both be implemented when this issue is closed.

guimou commented 3 years ago

Perfect, thanks @mroman-redhat !

wseaton commented 3 years ago

Just wanted to comment and say we are looking at implementing something similar, our use case is granting teams access to 'shared folders'. In our deployment users are mapped to LDAP groups that represent what 'team' they are on, so we should think about how the solution can be easily integrated w/ the auth_state functionality and mapped to a groups variable.

vpavlin commented 3 years ago

Hi @wseaton! Yes, leveraging OpenShift groups is planned. I have an issue for it here: https://github.com/opendatahub-io/odh-manifests/issues/290

opendatahub-io-contrib / jupyterhub-singleuser-profiles

Consider implementing volumes into profiles #38