jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.54k stars 796 forks source link

Add information about using NFS with z2jh #421

Open yuvipanda opened 6 years ago

yuvipanda commented 6 years ago

NFS is still a very popular storage setup, and is a good fit for use with z2jh in several cases:

  1. When you are supporting a large number of users
  2. When you are running on baremetal and NFS is your only option
  3. When your utilization % (% of total users active at any time) is very low, causing you to spend more on storage than compute.

While we don't want to be on the hook for teaching users to setup and maintain NFS servers, we should document how to use an NFS serer that already exists.

cam72cam commented 6 years ago

@yuvipanda I'd like to be a guinea pig on this. I am trying to setup a persistent EFS volume and use that as the user storage.

So far I've created and applied:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-persist
spec:
  capacity:
    storage: 123Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: <fs-id>.efs.us-east-1.amazonaws.com
    path: "/"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: efs-persist
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 11Gi

After that I added the following to my config.yaml:

singleuser:
  storage:
    static:
      pvc-name: efs-persist

I am pretty sure I am missing a few key ideas here

Edit: first change was to add "type: static" to the storage section in config changed pvc-name to pvcName

yuvipanda commented 6 years ago

w00t, thanks for volunteering :)

The other two things to keep in mind are:

  1. subPath (https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml#L138). This specifies where inside the share the user's homedirectory should go. Although it defaults to just {username}, I would recommend something like home/{username}
  2. User permissions. This can be a little tricky, since IIRC when kubelet creates a directory for subPath mounting it makes it uid 0 / gid 0, which is problematic for our users (with uid 1000 by default). The way we've worked around it right now is by using anongid / anonuid properties in our NFS share, but that's not a good long term solution. I've been working on http://github.com/yuvipanda/nfs-flex-volume as another option here. Is anongid / anonuid an option with EFS?
cam72cam commented 6 years ago
  1. I saw that, thanks for the clearer explanation
  2. I don't think that anonuid/gid is an option on EFS

I'll take a look through the nfs-flex-volume repo

yuvipanda commented 6 years ago

@cam72cam another way to check that everything works right now except for permissions is to set:

singleuser:
  uid: 0
  fsGid: 0
  cmd: 
    - jupyterhub-singleuser
    - --allow-root

If you can launch servers with that, then we can confirm that the uid situation is the only problem.

cam72cam commented 6 years ago

I am currently getting the following response:

 HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod in version \"v1\" cannot be handled as a Pod: v1.Pod: Spec: v1.PodSpec: Containers: []v1.Container: v1.Container: VolumeMounts: []v1.VolumeMount: v1.VolumeMount: SubPath: ReadString: expects \" or n, parsing 184 ...ubPath\": {... at {\"kind\": \"Pod\", \"spec\": {\"containers\": [{\"imagePullPolicy\": \"IfNotPresent\", \"lifecycle\": {}, \"ports\": [{\"containerPort\": 8888, \"name\": \"notebook-port\"}], \"volumeMounts\": [{\"subPath\": {\"username\": null}, \"name\": \"home\", \"mountPath\": \"/home/jovyan\"}, {\"readOnly\": true, \"name\": \"no-api-access-please\", \"mountPath\": \"/var/run/secrets/kubernetes.io/serviceaccount\"}], \"env\": [{\"name\": \"JUPYTERHUB_HOST\", \"value\": \"\"}, {\"name\": \"JUPYTERHUB_CLIENT_ID\", \"value\": \"user-efs4\"}, {\"name\": \"JUPYTERHUB_API_TOKEN\", \"value\": \"355dc09aca1143f580ee0435339cc18d\"}, {\"name\": \"JUPYTERHUB_USER\", \"value\": \"efs4\"}, {\"name\": \"EMAIL\", \"value\": \"efs4@local\"}, {\"name\": \"GIT_AUTHOR_NAME\", \"value\": \"efs4\"}, {\"name\": \"JUPYTERHUB_ADMIN_ACCESS\", \"value\": \"1\"}, {\"name\": \"JUPYTERHUB_SERVICE_PREFIX\", \"value\": \"/user/efs4/\"}, {\"name\": \"JPY_API_TOKEN\", \"value\": \"355dc09aca1143f580ee0435339cc18d\"}, {\"name\": \"JUPYTERHUB_API_URL\", \"value\": \"http://100.65.96.26:8081/hub/api\"}, {\"name\": \"JUPYTERHUB_BASE_URL\", \"value\": \"/\"}, {\"name\": \"JUPYTERHUB_OAUTH_CALLBACK_URL\", \"value\": \"/user/efs4/oauth_callback\"}, {\"name\": \"GIT_COMMITTER_NAME\", \"value\": \"efs4\"}, {\"name\": \"MEM_GUARANTEE\", \"value\": \"1073741824\"}], \"image\": \"jupyterhub/k8s-singleuser-sample:v0.5.0\", \"resources\": {\"limits\": {}, \"requests\": {\"memory\": 1073741824}}, \"args\": [\"jupyterhub-singleuser\", \"--ip=\\\"0.0.0.0\\\"\", \"--port=8888\"], \"name\": \"notebook\"}], \"securityContext\": {\"runAsUser\": 1000, \"fsGroup\": 1000}, \"volumes\": [{\"persistentVolumeClaim\": {\"claimName\": \"efs-persist\"}, \"name\": \"home\"}, {\"emptyDir\": {}, \"name\": \"no-api-access-please\"}], \"initContainers\": []}, \"metadata\": {\"labels\": {\"hub.jupyter.org/username\": \"efs4\", \"heritage\": \"jupyterhub\", \"component\": \"singleuser-server\", \"app\": \"jupyterhub\"}, \"name\": \"jupyter-efs4\"}, \"apiVersion\": \"v1\"}","reason":"BadRequest","code":400}

I suspect it has to do with:

              'volumes': [{'name': 'home',
                           'persistentVolumeClaim': {'claimName': 'efs-persist'}},
                          {'aws_elastic_block_store': None,
                           'azure_disk': None,
                           'azure_file': None,
                           'cephfs': None,
                           'cinder': None,
                           'config_map': None,
                           'downward_api': None,
                           'empty_dir': {},
                           'fc': None,
                           'flex_volume': None,
                           'flocker': None,
                           'gce_persistent_disk': None,
                           'git_repo': None,
                           'glusterfs': None,
                           'host_path': None,
                           'iscsi': None,
                           'name': 'no-api-access-please',
                           'nfs': None,
                           'persistent_volume_claim': None,
                           'photon_persistent_disk': None,
                           'portworx_volume': None,
                           'projected': None,
                           'quobyte': None,
                           'rbd': None,
                           'scale_io': None,
                           'secret': None,
                           'storageos': None,
                           'vsphere_volume': None}]},

It should have the nfs option set there if I understand correctly

Actually:

'volume_mounts': [{'mountPath': '/home/jovyan',
                                                 'name': 'home',
                                                 'subPath': {'username': None}},
                                                {'mount_path': '/var/run/secrets/kubernetes.io/serviceaccount',
                                                 'name': 'no-api-access-please',
                                                 'read_only': True,
                                                 'sub_path': None}],

it appears that subpath is not being set correctly

yuvipanda commented 6 years ago

So it turns out specifying subPath manually of home/{username} was required, so we should investigate why.

yuvipanda commented 6 years ago

The PVC needs to be in the same namespace as JupyterHub, so the pods can find it.

yuvipanda commented 6 years ago

The PVC needs to be told how to find the PV to match, and this is done by using:

  1. Labels to match PVC and PV
  2. Setting storageclass of PVC to '' (so kubernetes does not try to create a PV for it)

So

apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-persist
  labels:
    <some-label1-key>: <some-label1-value>
    <some-label2-key>: <some-label2-value>
spec:
  capacity:
    storage: 123Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: <fs-id>.efs.us-east-1.amazonaws.com
    path: "/"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: efs-persist
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  selector:
    matchLabels:
       <some-label1-key>: <some-label1-value>
       <some-label2-key>: <some-label2-value>
  resources:
    requests:
      storage: 11Gi

Things to note:

  1. Specify labels that uniquely identify this PV in your entire k8s cluster.
  2. The size requests bits are ignored both in the PV and PVC for EFS specifically, since it grows as you use it.
cam72cam commented 6 years ago

Ok, I've got the mount working. I did not do the label stuff yet, simply set 'storageClassName: ""' in the claim. That seemed to work just fine.

I ran into a speed bump where I had to change the security groups to allow access from EC2 to EFS. As a temporary measure I added both the EFS volume and the EC2 instances to the "default" security group. Eventually part of the initial kops config should add the correct security groups.

I am now getting a permission error: PermissionError: [Errno 13] Permission denied: '/home/jovyan/.jupyter'

I am going to try to change the permissions on the EFS drive first, and if that does not work try the root hack that @yuvipanda mentioned

EDIT: A manual chown on the EFS volume to 1000:1000 seems to have worked!

cam72cam commented 6 years ago

EFS Success!

Process:

Setup an EFS volume. It must be in the same VPC as your cluster. This can be changed in the AWS settings after it has been created. The EFS volume will be created in the default security group in the VPC. As a temporary hack around, add your cluster master and nodes to the default VPC group so they can access the EFS volume. Eventually we will setup proper security groups as part of this process.

Created test_efs.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-persist
spec:
  capacity:
    storage: 123Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: fs-$EFS_ID.efs.us-east-1.amazonaws.com
    path: "/"

Created test_efs_claim.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: efs-persist
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 11Gi

kubectl --namespace=cmesh-test apply -f test_efs.yaml kubectl --namespace=cmesh-test apply -f test_efs_claim.yaml

The sizes in these files don't mean what you think. There is no quota enforced with EFS **. In the future we want to set the efs PersistentVolume size to something ridiculously large like 8Ei and the PersistentVolumeClaim to 10GB (neither matters AFAICT). This is my rough understanding and could be incorrect.

A PersistentVolume defines a service which can perform a mount inside of a container. The PersistentVolumeClaim is a way of reserving a portion of the PersistentVolume and potentially locking access to it.

The storageClassName setting looks innocuous, but it is incredibly critical. The only non storage class PV in the cluster is the one we defined above. In the future we should tag different PV's and use tag filters in the PVC instead of relying on a default of "".

We are going to configure jupyterhub to use the same "static" claim among all of the containers*** . This means that all of our users will be using the same EFS share which should be able to scale as high as we need.

We now add the following to config.yaml

singleuser:
  storage:
    type: "static"
    static:
      pvcName: "efs-persist"
      subPath: 'home/{username}'

type static tells jh not to use a storage class and instead use a PVC defined below. pvcName matches the claim name we specified before subPath tells where on the supplied storage the mount point should be. In this case it will be "$EFS_ROOT/home/{username}"

It turns out there is a bug in jupyterhub where the default subPath does not work, and setting the subPath to "{username}" breaks in the same way.

At this point if we tried to start our cluster, it would fail. The directory created on the mount at subPath will be created with uid:0 and gid:0. This means that when jupyter hub is launched it won't be able to create any files, will complain, then self destruct.

What we need to do is tell the container to run our jupyterhub setup as root, then switch to the Jovian user before starting the jupyterhub process. When we are running as root we can do our own chown to adjust the created directory permissions.

First we merge the following to our config.yaml

singleuser:
  uid: 0
  fsGid: 0
  cmd: "start-singleuser.sh"

This tells jupyterhub to enter the container as root and run the start-singleuser script. Start-singleuser calls a helper start.sh script which we will use later on.

This will get jupyter hub to provision the container and attempt to start, but the process will still fail as the chmod has not taken place.

In order for us to have a properly chowned directory in /home/Jovian mounted from $EFS_ROOT/home/{username}, we need to create our own docker container***.

Here are some terse steps: Create a docker account Create a docker repo Create a directory to store the build file Create Dockerfile inside that directory

FROM jupyter/base-notebook:281505737f8a

# pin jupyterhub to match the Hub version
# set via --build-arg in Makefile
ARG JUPYTERHUB_VERSION=0.8
RUN pip install --no-cache jupyterhub==$JUPYTERHUB_VERSION

USER root
RUN sed -i /usr/local/bin/start.sh -e 's,# Handle username change,chown 1000:1000 /home/$NB_USER \n # Handle username change,'
RUN cat /usr/local/bin/start.sh
USER $NB_USER

The base dockerfile came from https://github.com/jupyterhub/zero-to-jupyterhub-k8s/commit/967b2d2a2c6293ba686c8e57a9f9473575c1494e#diff-aed8b29ee8beb1247469956c481040c2 Notice that we are using the older revision. The newer revision is broken in some awesome way that Yuvi needs to fix. This script is fragile and should be done better in the future... The first parts of the docker setup are done as $NB_USER The rest of the file is done as ROOT since we need to modify the start.sh script which will be run as root when the container is started. Many of the files referenced can be found in https://github.com/jupyter/docker-stacks/tree/master/base-notebook sudo yum install docker sudo docker login sudo docker build ${directory_containing_dockerfile} sudo docker tag ${image_id_in_cmd_output} $docker_username/$docker_repo # can also be found by sudo docker images sudo docker push $docker_username/$docker_repo

Merge the following into config.yaml

singleuser:
  image:
   name: $docker_username/$docker_repo
    tag: latest

You may be able to do a helm upgrade, but I ended up purging and reinstalling via helm just to be safe.

At this point you should be all set with a semi-fragile (but functional) EFS backed jupyterhub setup

Debugging tools: (all with --namespace=)

** fuse layer for fs quota
*** We may run into issues with a hundred containers all hitting the same EFS volume.  I suspect that AWS can more than handle that, but I have been wrong before.  If it can't handle that Yuvi has a WIP nfs server sharding system partially built that we could use.
**** I hope that the changes I made to the base container will be adopted by the project as it seems relatively harmless to have in the start script.  Even if it is harmful to others, I would still like it in there as a config option (if possible).
yuvipanda commented 6 years ago

Thank you for getting this working, @cam72cam!

To summarize, this mostly works, except for the issue of permissions:

  1. When using subPath, Kubernetes creates this directory when it doesn't exist if it needs to
  2. However, this will always be created as root:root
  3. Since we want our users to run as non-root, this won't work for us and we have to use hacks to do chowns.

It'll be great if we can fix EFS or Kubernetes to have options around 'what user / group / mode should this directory be created as?'

cam72cam commented 6 years ago

Could we add the chown hack I put in my image to the start.sh script in the stacks repo? https://github.com/jupyter/docker-stacks/blob/master/base-notebook/start.sh#L19

Would that break any existing users setups?

manics commented 6 years ago

There are a few issues in Kubernetes which seem to disagree on whether Kubernetes should set the permissions on subpaths or not:

cam72cam commented 6 years ago

I figure doing the chown ourselves resolves it for now (behind a config setting) and can be removed once K8S finalizes if/how/when the subpath permissions should be set

cam72cam commented 6 years ago
singleuser:
  image:
    name: jupyter/base-notebook
    tag: 29b68cd9e187
  extraEnv:
    CHOWN_HOME: 'yes'

Just confirmed the fix in my test env

zcesur commented 6 years ago

I was able to use my NFS-backed persistent claim on Google Cloud as the user storage by following the steps @cam72cam outlined, so I can attest his solution. Thanks for paving the way guys!

amanda-tan commented 6 years ago

Just to clarify, do we do a helm installation first using a config file with the start-singleuser.sh command inserted; and then do a helm upgrade using an updated config file with singleuser image?

cam72cam commented 6 years ago

Either should work, though I'd recommend a clean install just to be safe.

choldgraf commented 6 years ago

Hey all - it sounds like there's some useful information in this thread that hasn't made its way into Z2JH yet. Might I suggest that either:

  1. @cam72cam opens a PR to add a guide for NFS, similar to what's above
  2. If this isn't a solution we want to "officially" recommend yet for the reasons @yuvipanda mentions above, @cam72cam should write up a little blog post and we can link to this.

What do folks think?

cam72cam commented 6 years ago

We are currently doing an internal alpha with a setup similar to the one mentioned above and working out any minor issues which come up. @mfox22 I'd be up for either, what do you think?

My biggest concern with how it works at the moment is that a clever user could look at the system mounts and figure out how to do a userspace nfs mount with someone else's directory. I think we could get around that with configuring the PV differently, but I still have a lot to learn in regards to that.

choldgraf commented 6 years ago

Well if this is "useful functionality that may introduce some bugs because it's in an 'alpha' state" kinda functionality, maybe a blog post kinda thing is better? One reason we added https://zero-to-jupyterhub.readthedocs.io/en/latest/#resources-from-the-community was to make it easier for people to add more knowledge to the internet without needing to be "official" z2jh instructions. :-)

If you like, I'm happy to give a writeup a look-through, and if it ever gets to a point that your team is happy with, we can revisit bringing it into Z2JH?

cam72cam commented 6 years ago

@yuvipanda I've added instructions for using EFS, Other than the initial setup, it should be pretty similar for a standard NFS server.

choldgraf commented 6 years ago

Can we consider this issue resolved in this case? Or perhaps we need a small amount of language basically saying what @cam72cam just said?

cam72cam commented 6 years ago

Maybe we can refactor the initial work I did to have a common PersistentStorage guide with a NFS servewr as the example. Rewrite the EFS page to link to that with some instructions about the initial EFS setup.

choldgraf commented 6 years ago

Refactoring and clarifying is always welcome @cam72cam - let me know if you'd like a review at some point!

consideRatio commented 6 years ago

Bumping - still relevant issue, I lack the overview of the quite long discussion though.

vroomanj commented 6 years ago

I second the bump. I'm having issues understanding how to connect the Hub to NFS.

cam72cam commented 6 years ago

@vroomanj I still need to break the docs out into 2 different sections. Take a look through https://zero-to-jupyterhub.readthedocs.io/en/stable/amazon/efs_storage.html

Changes for pure NFS setup:

stefansedich commented 5 years ago

@cam72cam thanks for the work on the docs!

I was getting EFS setup and found what I feel was an easier way and wanted to see what your thoughts were, I was able to remove the uid and fsGid and the chowning by simply performing a chown 1000:100 on the /home directory which I created beforehand on my EFS volume.

Do you see any downsides with this?

cam72cam commented 5 years ago

Huh, that sounds like a much cleaner solution

stefansedich commented 5 years ago

Mmm @cam72cam one thing I realized was that I moved to a global shared volume mounted and not per user, looks like per use it still is required as it appears to create the folder as root still...

AbhinavanT commented 5 years ago

If anybody is wondering, adding NFS in GKE is similar - and for shared read/write access I used the same approach as @stefansedich.

To avoid duplicating information, essentially just followed this guide for GKE:https://medium.com/platformer-blog/nfs-persistent-volumes-with-kubernetes-a-case-study-ce1ed6e2c266. Keep in mind if you're doing this to pay attention to jhub namespace.

emirot commented 5 years ago

Does https://zero-to-jupyterhub.readthedocs.io/en/stable/amazon/efs_storage.html works for > 0.8.0 or some changes are needed ?

I would like to store the 'home' of my users in EFS instead of EBS by default. As well as a shared folder as suggested in the documentation.

Is it possible to have :

singleuser:
  storage:
    type: "static"
    static:
      pvcName: "efs-persist"
      subPath: 'home/{username}'
    extraVolumes:
      - name: jupyterhub-shared
        persistentVolumeClaim:
          claimName: efs-persist
    extraVolumeMounts:
      - name: jupyterhub-shared
        mountPath: /home/shared
  extraEnv:
    CHOWN_HOME: 'yes'
  uid: 0
  fsGid: 0
  cmd: "start-singleuser.sh"

Does

singleuser:
  storage:
    dynamic:
      storageClass: "aws-efs"

Is working for you ?

Do I have to go inside my efs volume in the pvc created after first run an change the permissions ?

@cam72cam @stefansedich I would love to know the steps that worked for you according to the version, I'm a bit confused.

albertmichaelj commented 5 years ago

@emirot Did you ever figure out how to get extraVolumes to work on EFS? I am primarily interested in EFS as a way to have a ReadWriteMany directory in which to collect assignments. I've successfully setup home on EFS, but I haven't had any luck getting the shared extra volume on EFS.

Has anybody done this? If so, I would really, really appreciate any information that you may have.

emirot commented 5 years ago

@albertmichaelj I've ended up doing something like that :


  storage:
    dynamic:
      storageClass: aws-efs
    extraVolumes:
      - name: jupyterhub-shared
        persistentVolumeClaim:
          claimName: jupyterhub-shared
    extraVolumeMounts:
      - name: jupyterhub-shared
        mountPath: '/home/jovyan/shared'
  defaultUrl: "/lab"

Which looks to be working

albertmichaelj commented 5 years ago

@emirot Thanks for your reply! I figured out my problem (I accidentally typed extrVolumeMounts instead of extraVolumeMounts!).

However, I have had to do a workaround that isn't quite ideal, and I wonder if anyone else has run into this and may have thoughts and suggestions. It would be very much appreciated!

What I have is a little bit different of a setup. First, I'm using EFS for my home directories (which it looks like you are using EBS for yours), and that has ended up being pretty important for me. I also am trying to use a subPath parameter, but that is not as important. However, I also want to mount a shared volume at /home/shared, which works fine if I use two PVCs (and PVs) namely nfs-persist and nfs-persist-shared. However, I can't figure out why a single PVC and PV won't work.

In case anyone can figure out why it isn't working (and I would be very, very grateful for help, I've been tearing my hair out over this), I'm going to go through my config.

What I have tried to do for setting up the volume mounts is below (note that in my docker image /home/shared is a folder that exists because I figured it might need an existing mount point to successfully mount, but I can remove this if it's causing problems).

  singleuser:
    storage:
      type: "static"
      static:
        pvcName: "nfs-persist"
        subPath: 'home/{username}'
      extraVolumes:
        - name: jupyterhub-shared
          persistentVolumeClaim:
            claimName: "nfs-persist"
      extraVolumeMounts:
        - name: jupyterhub-shared
          mountPath: "/home/shared"
          subPath: "home/shared"

My nfs-persist pv config is:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-persist-shared
spec:
  capacity:
    storage: 11Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: fs-${EFS_ID}.efs.us-east-1.amazonaws.com
    path: "/"

and my pvc config is:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-persist
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 11Gi

I get no error when I use this config. The PV and PVC are created and exist. The ouput of kubectl describe pvc --namespace=quadc nfs-persist seems to indicate that my pod is using it twice (jupyter-michael).

Name:          nfs-persist
Namespace:     quadc
StorageClass:  
Status:        Bound
Volume:        nfs-persist
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      11Gi
Access Modes:  RWX
Mounted By:    jupyter-michael
               jupyter-michael
Events:        <none>

However, when I try to start the pod, it hangs and I get the following error and the pod never starts.

2019-06-22 18:51:56+00:00 [Warning] Unable to mount volumes for pod "jupyter-michael_quadc(85a8e90b-951e-11e9-9bb5-0efe5baffb84)": timeout expired waiting for volumes to attach or mount for pod "quadc"/"jupyter-michael". list of unmounted volumes=[home]. list of unattached volumes=[home jupyterhub-shared]

However, if I set up an identical second PV and PVC (except the name is changed to nfs-persist-shared), everything works exactly as expected. It's only when I try to reuse the same PV and PVC. Is there some fundamental reason that this should be the case?

Has anyone successfully used two EFS volumes mounted on a pod from one PV and PVC? If so, I would really, really appreciate any advice.

Thanks!

fifar commented 5 years ago

@albertmichaelj try if this helps (note that the name is home)

  singleuser:
    storage:
      type: "static"
      static:
        pvcName: "nfs-persist"
        subPath: 'home/{username}'
      extraVolumeMounts:
        - name: home
          mountPath: "/home/shared"
          subPath: "home/shared"
albertmichaelj commented 5 years ago

Hi @fifar. Thanks for the response! I am able to get the extra volumes just fine. The problem I have is that I can't reuse the same PV and PVC for multiple volumes for the same pod. I think this is a fundamental limitation of kubernetes, but I'm not sure why. I have solved this by creating a helm chart that automatically makes the PV and PVCs that I need.

Thanks!

stefansedich commented 5 years ago

@albertmichaelj a PVC -> PV binding is an exclusive 1-1 mapping, from the documentation:

Once bound, PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping.
albertmichaelj commented 5 years ago

@stefansedich I understand that PVC -> PV is a one to one mapping. However, my PV is ReadWriteMany, so I had intended to use the same PVC for multiple mounts on the same pod (with different subPaths and mountPaths). I wasn't trying to map the same PVC to multiple PVs. Given that the same PVC can be mapped to many pods if the PV is ReadWriteMany, I had assumed that I could use the same PVC multiple times in the same pod (since it was being used multiple times over several pods in an analogous way and that was fine). I think that is not allowed, but I haven't found explicit documentation of that.

fifar commented 5 years ago

@albertmichaelj So, I don't think you've tried my solution. I have the same requirement as yours, mounting one PVC multiple times on the same pod. How I got the name home is when you execute kubectl describe pod $pod_name, and check "Mounts" and "Volumes" parts, you'll find necessary hints. And as @stefansedich mentioned, PVC -> PV is exclusive 1-1 mapping, we're reusing the 1-1 existing mapping which is home.

albertmichaelj commented 5 years ago

@fifar I thought I had tried your solution, but I didn't quite understand what you were suggesting. Just for sake of posterity, I'll lay out the problem I was having. My initial config was (basically) this:

    storage:
      type: "static"
      static:
        pvcName: "nfs-home"
        subPath: 'home/{username}'
      extraVolumes:
        - name: home
          persistentVolumeClaim:
            claimName: "nfs-home"
      extraVolumeMounts:
        - name: home
          mountPath: "/mnt/data"
          subPath: "shared_data"

When I had this, I got the following error message when I try to spawn the pod:

Spawn failed: (422) Reason: error HTTP response headers: HTTPHeaderDict({'Audit-Id': '08312c49-cc6c-4d97-9ed3-bc862151b44c', 'Content-Type': 'application/json', 'Date': 'Wed, 24 Jul 2019 13:08:15 GMT', 'Content-Length': '372'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"jupyter-albertmichaelj\" is invalid: spec.volumes[2].name: Duplicate value: \"home\"","reason":"Invalid","details":{"name":"jupyter-albertmichaelj","kind":"Pod","causes":[{"reason":"FieldValueDuplicate","message":"Duplicate value: \"home\"","field":"spec.volumes[2].name"}]},"code":422}

basically saying that I had a duplicate resource (which I did, two things named home). However, when I used the config:

    storage:
      type: "static"
      static:
        pvcName: "nfs-home"
        subPath: 'home/{username}'
      extraVolumeMounts:
        - name: home
          mountPath: "/mnt/data"
          subPath: "shared_data"

it works! The problem is that I had the extraVolumes for the home volume in there again (which was already implicitly included in my home config).

This now works great, and I can use a single PV and PVC for as many mounts as I'd like! I had been creating dozens of PVs and PVCs (which is not that hard to do with a helm template, but it is annoying) in order to mount multiple shared volumes (groups, data, whole class, etc...). This is much more elegant.

Thanks @fifar.

fifar commented 5 years ago

@albertmichaelj Good to know it works for you. The config without extraVolumes is exactly my suggestion. Actually, your solution inspired me much (the subPath in extraVolumeMounts) when I started thinking reusing one EFS for the same pod.

sebastian-luna-valero commented 5 years ago

Hi,

I was also struggling to get z2jh up and running on prem, and inspired by: https://raymondc.net/2018/12/07/kubernetes-hosted-nfs-client.html

My solution was to helm install the nfs-client chart: https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client

and make it the default storage class on my kubernetes cluster. Specific steps:

# install nfs-client chart
helm install stable/nfs-client-provisioner --set nfs.server=kubeserver --set nfs.path=/home --name nfs-client --namespace jhub
# define it as default storage class
kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

I hope that helps others as well.

Best regards, Sebastian

PS1: This could help here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/593

PS2: Similar issue here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1320

PS3: This solution works when you have your own NFS server deployed. If not, you should try this other chart instead: https://github.com/helm/charts/tree/master/stable/nfs-server-provisioner

akaszynski commented 4 years ago

Thanks for everyone who commented on this. I was having issues long login times when using Jupyterhub on Azure Kubernetes Service (AKS), and was able to take the login times from two minutes to 20 seconds by using NFS. For any who are interested, here's how I did it:

# persistent volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: userdata-pv
  namespace: jup
spec:
  capacity:
    storage: 20000Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: 10.11.0.4
    path: "/userdata"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: userdata-pvc
  namespace: jup
spec:
  storageClassName: ""
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
# config.yaml
singleuser:
  storage:
    type: "static"
    static:
      pvcName: "userdata-pvc"
      subPath: 'home/{username}'
  uid: 0
  fsGid: 0
  cmd: "start-singleuser.sh"
consideRatio commented 4 years ago

@akaszynski was the delay related to attaching the default PVCs that were dynamically created by JupyterHub, or perhaps on its creation? Did you notice this big time difference between first login ever and the second login, or was there a big time delay even on second login and jupyterhub user pod startups?

akaszynski commented 4 years ago

@consideRatio: The delay was for both. Maybe 1-2 minutes to create, 30-60 seconds to mount. When the end user expects things to load near instantly, having the progress bar hang there will drive them crazy!

I've even moved the hub-db-dir over to nfs as it was taking forever create the hub pod everytime I upgraded the cluster:

hub:
  extraVolumes:
    - name: hub-db-dir
      persistentVolumeClaim:
        claimName: userdata-pvc
consideRatio commented 4 years ago

@akaszynski thanks for sharing this insight! :heart:

meeseeksmachine commented 4 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/efs-multi-user-home-directories/3032/3

josibake commented 4 years ago

Just finished setting up user NFS storage and a shared NFS folder based on this excellent issue and the CHOWN stuff in base image (thanks @cam72cam !). Just checking in to see if this is still being worked on from either the K8s perspective or making it simpler in Z2JH setup? I plan on writing a blog on my experience just because there were small things that took me forever to figure out (like running singleuser.uid: 0 to get CHOWN to work), but I definitely see this as an important use case.