nebari-dev / nebari-docs

📖 Documentation for Nebari
https://www.nebari.dev
BSD 3-Clause "New" or "Revised" License
14 stars 29 forks source link

Add fast, persistent scratch space on a per user basis #379

Open iameskild opened 1 year ago

iameskild commented 1 year ago

There is often a need for a fast, persistent scratch space when working with and compiling large projects (such as Pytorch). At the moment, users can add an ephemeral scratch space per user by modifying one of the profiles in the profile section of the nebari-config.yaml (this might also require adding a fast custom storage space using the helm_extension):

Example of modified `profile` ```yaml - display_name: Medium Instance description: Stable environment with 4 cpu / 16 GB ram kubespawner_override: cpu_limit: 4 cpu_guarantee: 3 mem_limit: 16G mem_guarantee: 10G extra_pod_config: volumes: - name: scratch-volume ephemeral: volumeClaimTemplate: spec: accessModes: [ "ReadWriteOnce" ] # defined in local helm chart: custom-premium-sc storageClassName: "premium-rwo-immediate" resources: requests: storage: 200Gi extra_container_config: - name: "scratch-volume" mountPath: "/scratch" ```

I am opening this issue to explore other possible options.

  1. Implement the solution that is being discussed in nebari-dev/nebari#1549

    • This would mean realigning with Z2JH and giving each user who launches a JupyterLab server their own persistent storage volume (as detailed in this comment). The size and type of storage volume attached is something that we could allow Nebari administrators to set in their nebari-config.yaml.
    • Pros:
    • Aligns with Z2JH - folks coming from vanilla JupyterHub might be more familiar with this setup
    • Users don't have to worry about one user taking up all of the space in the single storage volume (which is currently a possibility).
    • Cons:
    • This would likely increase the cost of running a cluster as each user would have their own storage volume and if there are infrequent users, you would still be paying for their storage volume.
  2. Create a special user group (with an associated server profile) that includes an scratch persistent volume.

    • This solution is a compromise between what we currently have and the solution proposed above (1.).
    • Pros:
    • Depending on the number of users in this new group and the size/type of storage attached, this might not be that much more of an additional cost. That said, as the number of users in this group increases and the size of storage volume increases, the potential cost savings would quickly evaporate.
    • Cons:
    • In order to limit which users can launch this specific type of server profile, we might also need to invest some development effort in making sure these profiles are visible to other users.
  3. Set up a new server profile per user and attach whichever additional storage each of those users might need.

    • As far as I am aware, this solution does not require any additional development and only requires bespoke modification to the nebari-config.yaml (similar to the example of the attached ephemeral storage above).
    • Pros:
    • No Nebari development is required.
    • Cons:
    • This solution is only tenable for small groups of users (under 10). As the number of users increases, the harder it will be to manage the large list of server profiles.
    • Users will need to be sure to launch their own server profile otherwise they would be attached to their colleague's scratch space.
costrouc commented 1 year ago

(2) sound difficult since you cannot have a scratch persistent volume that is shared (requires nfs and will be slow). Basically I don't think this approach will work at all.

(3) absolutely a zero development required approach

(1) to me sounds like the most correct option. Also opens up the ability to have certain users use this feature with others using the shared storage.

I'm in favor of (1) and I don't think it is too much development. I do think that it does make backups more difficult.

iameskild commented 1 year ago

As a quick way to add persistent, per user PVCs, you can add the following the kubespawner_override section of a profile:

profiles:
  jupyterlab:
  - display_name: Small Instance
    description: Stable environment with 2 cpu / 8 GB ram
    kubespawner_override:
       ...
      storage_pvc_ensure: true
      storage_class: premium-rwo-immediate # custom storageClass with `volumeBindingMode: Immediate`
      storage_capacity: 200Gi
      extra_pod_config:
        volumes:
        - name: persistent-scratch
          persistent_volume_claim:
          claim-name: "{username}-scratch-pvc"
      extra_container_config:
        volumeMounts:
        - name: persistent-scratch
          mountPath: "/scratch"
iameskild commented 1 year ago

This is a good candidate for a FAQ.