salvis2 commented 4 years ago

This is a problem that has been present on Pangeo JupyterHubs for a while. JupyterHub home directories are generally backed by an NFS of some sort; the GCP hubs use Google FileStore and the AWS hubs use AWS EFS. However, there is not yet a way to enable limits for any individual user's storage on the JupyterHub; any user can make their home directory about as big as the entire NFS. We would like to solve this.

As far as AWS EFS goes, under the FAQ section for the EFS-Provisioner, which we use to fill PVCs for new users logging in, it says

I noticed when creating the claim it has request for a really small amount of storage? The storage section size is a requirement because most other PersistentVolumes need it. Every pod accessing EFS will have unlimited storage. I use 1Mi to remind my self it's unlimited.

At least for the AWS hubs, the solution will need to come before this step (looking at "Every pod accessing EFS will have unlimited storage"). This will be beneficial I think, because then the solution on AWS should be almost if not identical to the solution on GCP.

Currently, I've been looking into this article: How To Set Filesystem Quotas on Ubuntu 18.04 since all the pangeo Docker images start from Ubuntu 18.04.

I've installed some extra apt packages (quota and linux-image-extra-virtual) on my own Docker image (hosted here for testing and deployed on http://staging.icesat-2.hackweek.io/ ). Both of the test commands in the first two steps work. The third step is to modify the /etc/fstab file to activate quotas by mounting filesystems with quota-related options. However, the /etc/fstab file only contains

# UNCONFIGURED FSTAB FOR BASE SYSTEM

so I am suspicious about the third step working. If the file isn't configured, then changing it feels like it won't have any effect since it didn't beforehand. This gets into some questions about if / how we can make the user pods mount the home directory in a way that uses this file. My impression is that if we could get to a point where we use that file, it would be easy to modify and get user quotas established, albeit in a hacky, linux way.

If @consideRatio , @yuvipanda , or others have thoughts on this, I'd love to hear them.

Ping @scottyhq , @jhamman , @rabernat for their previous interest in this problem.

salvis2 commented 4 years ago

There seems to be documentation supporting the use of the /etc/fstab file for individual compute instances. Here are the Google Filestore instructions and here are the AWS EFS instructions. They are pretty similar, so a programmatic write to /etc/fstab should work similarly for both cloud providers.

consideRatio commented 4 years ago

@salvis2 wow nice find! I think this may be a very good approach! I'm don't know the details of file systems but I don't expect there to be a good option to this. Exploring and writing a blog post or similar about this would be a very well worth time investment to make for our community!

Regarding the third step, I think the key is to properly understanding /etc/fstab and what makes its content have meaning.

Does /etc/fstab have meaning within a docker container (k8s pod) with storage mounted to it? I'd like to learn more about /etc/fstab and the systems using it.

Dear Rubber duck!

I write to you to make me think better again!

Wikipedia sais this about fstab:

The fstab file typically lists all available disk partitions and other types of file systems and data sources that are not necessarily disk-based, and indicates how they are to be initialized or otherwise integrated into the larger file system structure.

The fstab file is read by the mount command, which happens automatically at boot time to determine the overall file system structure, and thereafter when a user executes the mount command to modify that structure.

And the guide's step 3 sais this:

To activate quotas on a particular filesystem, we need to mount it with a few quota-related options specified. We do this by updating the filesystem’s entry in the /etc/fstab configuration file. Open that file in your favorite text editor now:

Hmmm... I'm afraid this may be hard to accomplish thinking about things more.

Loose thoughts

We can have a single VM (k8s node) with many users (k8s pods / docker containers). What we want is not only to have a limit per VM, but per user.
Storage limits probably need feedback directly from the write request, which would be from the NFS server itself, because only the NFS server itself knows how much storage is used. Hmmm... I fear that the NFS server won't know what user (k8s pod / docker container) is reading / writing to it, but only what k8s node or similar.

Hmmm I've love to see this being doable, but I fear it may be hard. I know @yuvipanda have considered this before and probably also have an issue of relevance about it as well.

I've never tried out Rook, but a reason for not doing so before have been that it wasn't so mature ~2 years ago as it is now. Now for example I think its possible to get dynamic provisioning of storage with a k8s StorageClass resource which wasn't possible before.

I wonder if Rook backed by CephFS would support limitations of storage while also not consuming more than is available. Hmmm, would it be possible to have 100GB total available storage for users that is limited but not guaranteed to 10GB per user? If so, users could get out of disk space errors for two reasons, one being their limit is reached and one is that the net storage available for all users were capped.

Potentially related references

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/683
https://rook.io/
recent kubecon rook presentation: https://www.youtube.com/watch?v=h38FCAuOehc

yuvipanda commented 4 years ago

Ah, fun fun fun fun fun :D

TLDR is that enforcing quotas is impossible with EFS / Filestore, but there is hope.

Longer version!

@consideRatio saying:

Storage limits probably need feedback directly from the write request, which would be from the NFS server itself, because only the NFS server itself knows how much storage is used. Hmmm... I fear that the NFS server won't know what user (k8s pod / docker container) is reading / writing to it, but only what k8s node or similar.

is accurate, storage quotas need to be set on the NFS file server, not on the pods themselves. /etc/fstab and similar inside containers basically have no meaning, since the container has no control over anything that gets mounted.

There are two ways to set quotas - via user/groups ID, or via directory name.

By user / group id is the common way, supported by most file systems (including ext4, the default). This is what is mentioned by the article @scottyhq pointed out. If we can set this up on the NFS Server, it would work if we can make sure that each user has a separate uid. However, right now, all our users run with the same uid (1000), so this is not possible.

Via directory name is more useful for us, and is supported in fewer file systems - XFS being the most common. It's possible btrfs and ZFS support it, but I'm not sure? This would also need to run on the NFS server, and would require we write something that maintains the quotas for each directory.

Both these options require we run our own NFS server, rather than use EFS / Filestore - we need to fiddle with the NFS server filesystems, which we can't really do in these managed offerings.

Possible next steps here:

Figure out how to give each user their own unique uid / gid, so we can use uid based quotas. This also brings with it other advantages - better security, and more traditional ways to share files between users.
Consider running the NFS Server in the k8s cluster itself. This lets us customize it better, possibly using XFS + project quotas as sidecars.
Run a NFS Server in a VM, and use XFS + project quotas.

If anyone wants to put time and effort into this, I'm happy to point people in directions 🍡

consideRatio commented 4 years ago

@yuvipanda wieee a condensed knowledge candy post :D I love it! Thanks for the writeup!!! This will be a post I'll read and consider multiple times.

yuvipanda commented 4 years ago

I have this working now for a hub I run!

I'll post code shortly

yuvipanda commented 4 years ago

https://github.com/yuvipanda/get-quota-the-home/blob/master/generate.py is the script I have running on the NFS server, and it does the job. More work needed, but I think it should be a nice and fairly resilient solution.

This does require running our own NFS server though. It should be possible / easy to do this inside the kubernetes cluster itself.

yuvipanda commented 4 years ago

I actually forgot that https://github.com/kubernetes-incubator/external-storage/tree/master/nfs already supports quotas! I'd really love for someone to try that out, means no work on our end.

consideRatio commented 4 years ago

Woooooo! Nice exploratory work and implementation @yuvipanda!!!

How did you get yourself XFS storage? Did you define a GCE PD storageClass with a fstype requesting xfs? Reference: https://kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd

I have never used the nfs-provisioner myself and have some learning to do still. I see that if one use the nfs-provisioner Helm chart, a new storage class is created, hmmm... Does that mean that users of this NFS server would create PVCs referencing that storage class and request storage which is then gets quota limits?

Does the underlying nfs-provisioner quota implementation logic require XFS, or would ext4 be fine as well? It may be troublesome to get XFS for storage unless one is at GCP it seems from the previous link. Hmmm, it seems like it depends on XFS as indicated here.

My understanding of how to use nfs-provisioner's Helm chart

Install the NFS provisioner, and configure its own persistent storage be backed by XFS.
The chart mounts a Persistent Volume volume at this location. The volume can be created using dynamic volume provisioning. However, it is highly recommended to explicitly specify a storageclass to use rather than accept the clusters default, or pre-create a volume for each replica.
```
# nfs-provisioner Helm chart config
persistence:
enabled: true
storageClass: "xfs-ssd"
size: 200Gi
```
```
# xfs-storageclass.yaml that we manually install alongside
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
 name: xfs-ssd
provisioner: kubernetes.io/gce-pd
parameters:
 type: pd-ssd
 fstype: xfs
```

We use the nfs-provisioner Helm charts's created storageClass to get NFS storage.

# nfs-provisioner Helm chart config
storageClass:
 provisionerName: cluster.local/nfs

# JupyterHub helm chart config to use the NFS storage
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml#L283
singleuser:
 storage:
   dynamic:
     storageClass: cluster.local/nfs

Step 2 means that a new PVC will be created for each user, while typically I've used a single PVC pointing to a NFS server, and let each pod mount a different folder path.

My understanding of how to use the script Yuvi made

We create a pod with a container or sidecar container in the NFS server pod which mounts the XFS storage, and then this XFS storage is monitored to update quotas using the XFS CLI called xfs_quota.

Useful thing

To inspect the file system used in a folder, one can run stat --file-system --format=%T /home/jovyan which will output nfs, overlayfs, or xfs etc.

yuvipanda commented 4 years ago

How did you get yourself XFS storage? Did you define a GCE PD storageClass with a fstype requesting xfs? Reference: kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd

I just have a separate NFS VM that has a disk formatted as XFS (with mkfs.xfs command). This is what we have at berkeley right now, not ideal.

Does that mean that users of this NFS server would create PVCs referencing that storage class and request storage which is then gets quota limits?

This is my understanding!

# JupyterHub helm chart config to use the NFS storage
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml#L283
singleuser:
  storage:
    dynamic:
      storageClass: cluster.local/nfs

You should be able to set singleuser.storage.capacity here, and have that made available as quota.

To inspect the file system used in a folder, one can run stat --file-system --format=%T /home/jovyan which will output nfs, overlayfs, or xfs etc.

This is great to know! I mostly just run mount which provides the same information. With quotas, mount actually provides me the quota'd capacity, not the total NFS capacity!

salvis2 commented 4 years ago

I'd be interested in both of these:

Figure out how to give each user their own unique uid / gid, so we can use uid based quotas. This also brings with it other advantages - better security, and more traditional ways to share files between users.

General benefits for our cloud offerings are always nice.

Consider running the NFS Server in the k8s cluster itself. This lets us customize it better, possibly using XFS + project quotas as sidecars.

Always want to run more infrastructure! It would probably be cheaper than a managed offering and we can integrate it with Terraform!

I'll do some reading up on these things this week, but if you have some pointers (besides the link in the second point), I'd love them.

consideRatio commented 4 years ago

I'm really excited about this work in general, it feels to me like a proper solution to a long standing functionality issue (storage quotas for NFS servers) and cost issue (Google's managed NFS called Filestore for example is expensive for smaller deployments).

yuvipanda commented 4 years ago

I'm convinced now that the solution is https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner

consideRatio commented 4 years ago

I spent some time reading up, and this is my summary of whats important to overview the situation I think.

NFS tech overview

NFS Ganesha is a modern open source NFS server.

NFS Server Provisioner was a kubernetes project for a Kubernetes Volume Provisioner backed up by a NFS Ganesha server. So, a Volume provisioner is the thing you would reference from a k8s StorageClass resource, which in turn is what a PVCs would reference with storageClassName, which in turn a Pod would reference to mount storage. This repository maintains a Dockerimage published to quay.io/kubernetes_incubator/nfs-provisioner I think.

The NFS Server Provisioner project had two associated Helm charts, the nfs-server-provisioner and the nfs-client-provisioner. The nfs-client-provisioner Helm chart is a slimmed version of the other, excluding the deployment of the actual NFS server.

As the NFS Server Provisioner resided kubernetes-incubator/external-storage, and the GitHub org kubernetes-incubator is now kubernetes-retired, they moved the NFS Server Provisioner part of the external-storage repo to kubernetes-sigs/nfs-ganesha-server-and-external-provisioner. The associated Helm charts have not migrated though.

Summary

We want to use https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner and an associated Helm chart, but the latest version of a Helm chart is the nfs-server-provisioner.

Question set 1

What are our common experiences of the nfs-server-provisioner Helm chart so far? I've personally not deployed it yet.
I see its documented that we can provision disks formatted for XFS on GCP, see this documentation. Have anyone experience with this so far?

Suggested exploration

I suggest to explore deploying a nfs-server-provisioner Helm chart on GKE, where we make the NFS server's own storage be created through a manually created StorageClass like the default GKE storage class but with fsType: xfs instead of fsType: ext4. We would configure the Helm chart to consume our custom StorageClass with persistence.storageClass.

It would also be cool to explore if there is some Prometheus metrics exposed that we can consume.

Final reflections

As provisioning XFS storage is something that depends on the cloud provider quite a bit, I think to deploy a NFS server with XFS etc in k8s will require some cloud provider lock in - some custom steps for the different cloud providers at least.

It may be a more cloud agnostic set of instructions we can develop if we provision a VM with a NFS Ganesha and install nfs-client-provisioner in the k8s cluster. I fear that only GKE can support XFS backed storage for the in-k8s-cluster deployed NFS server if we want to use nfs-server-provisioner and not involve a standalone VM.

consideRatio commented 4 years ago

The helm chart for nfs-server-provisioner is now maintained in https://github.com/kvaps/nfs-server-provisioner-chart, and the one in helm/charts is no longer the best option. https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner/pull/13 is about merging that back where it should reside.

sandeen commented 4 years ago

Hi all - I was pointed to this issue by a friend. I haven't fully grokked all of the discussion, but I wanted to point out that while project quotas originated with XFS and are well supported there, current ext4 should support them just fine as well - support was added to ext4 circa 2016, in kernel version 4.5.

consideRatio commented 4 years ago

[...] while project quotas originated with XFS and are well supported there, current ext4 should support them just fine as well - support was added to ext4 circa 2016, in kernel version 4.5.

Hmmm GKEs linux kernel will be more modern than that, so then why was there ever need to go XFS when @yuvipanda explored this hmmm... Is this about the NFS servers ability to use the filesystems quotasystem? Does NFS ganesha not support the modern ext4 on linux 4.5 ability to use quotas?

Ah it seems so, NFS Ganesha describes FSALs, file system abstraction layers - and the fact that im missing ext4 from this list makes me guess i may be... guessing in the right direction?

https://github.com/nfs-ganesha/nfs-ganesha/wiki

consideRatio commented 4 years ago

@ianabc is it okay for me to copy paste what you wrote on slack here about various cloud providers and XFS filesystem storage?

ianabc commented 4 years ago

Hi Erik,

Of course, please do!

-Ian

On Thu, Sep 24, 2020 at 6:34 PM Erik Sundell notifications@github.com wrote:

@ianabc https://github.com/ianabc is it okay for me to copy paste what you wrote on slack here about various cloud providers and XFS filesystem storage?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-cloud-federation/issues/654#issuecomment-698671810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN7FWGPER4UIX43E5KOWPDSHPXSHANCNFSM4OV4GZNA .

cslocum commented 3 years ago

Great investigative work all! This is really exciting. We've been beating our heads trying to figure out how to enforce per-user quotas (currently using EFS).

I'm looking forward to seeing if there is any traction on implementation of some of these ideas.

consideRatio commented 3 years ago

What @ianabc wrote on Slack

It might not be 100% relevant but I was experimenting with an NFS server based on ZFS for the same reason (on AKS). Ultimately it was stand-alone and just used the client provisioner but it worked OK

Maybe more related, I think I can do what @Erik Sundell is suggesting in AWS, I had something similar in another project. If I do

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: xfs-storage-class
provisioner: kubernetes.io/aws-ebs
parameters:
  fstype: xfs

Then if I deploy nfs-server-provisioner with

persistence:
  enabled: true
  storageClass: "xfs-storage-class"
  size: 10Gi
storageClass:
  defaultClass: true

I can use it with e.g.

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: storage-demo
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
---
apiVersion: v1
kind: Pod
metadata:
  name: storage-demo
spec:
  volumes:
    - name: storage-demo
      persistentVolumeClaim:
        claimName: storage-demo
  containers:
    - name: storage-demo
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: storage-demo

And the same thing seems to work on AKS with

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: xfs-storage-class
provisioner: kubernetes.io/azure-disk
parameters:
  fstype: xfs

Vaibhav1919 commented 1 year ago

Is there any update on this discussion? How can we restrict aws efs data usage per kubernetes pod? @consideRatio @ianabc @yuvipanda

pangeo-data / pangeo-cloud-federation

Enforce User Storage Limits #654

Dear Rubber duck!

Loose thoughts

Potentially related references

My understanding of how to use nfs-provisioner's Helm chart

My understanding of how to use the script Yuvi made

Useful thing

NFS tech overview

Summary

Question set 1

Suggested exploration

Final reflections

What @ianabc wrote on Slack