Open salvis2 opened 4 years ago
There seems to be documentation supporting the use of the /etc/fstab
file for individual compute instances. Here are the Google Filestore instructions and here are the AWS EFS instructions. They are pretty similar, so a programmatic write to /etc/fstab
should work similarly for both cloud providers.
@salvis2 wow nice find! I think this may be a very good approach! I'm don't know the details of file systems but I don't expect there to be a good option to this. Exploring and writing a blog post or similar about this would be a very well worth time investment to make for our community!
Regarding the third step, I think the key is to properly understanding /etc/fstab and what makes its content have meaning.
Does /etc/fstab have meaning within a docker container (k8s pod) with storage mounted to it? I'd like to learn more about /etc/fstab and the systems using it.
I write to you to make me think better again!
Wikipedia sais this about fstab:
The fstab file typically lists all available disk partitions and other types of file systems and data sources that are not necessarily disk-based, and indicates how they are to be initialized or otherwise integrated into the larger file system structure.
The fstab file is read by the mount command, which happens automatically at boot time to determine the overall file system structure, and thereafter when a user executes the mount command to modify that structure.
And the guide's step 3 sais this:
To activate quotas on a particular filesystem, we need to mount it with a few quota-related options specified. We do this by updating the filesystem’s entry in the /etc/fstab configuration file. Open that file in your favorite text editor now:
Hmmm... I'm afraid this may be hard to accomplish thinking about things more.
Hmmm I've love to see this being doable, but I fear it may be hard. I know @yuvipanda have considered this before and probably also have an issue of relevance about it as well.
I've never tried out Rook, but a reason for not doing so before have been that it wasn't so mature ~2 years ago as it is now. Now for example I think its possible to get dynamic provisioning of storage with a k8s StorageClass resource which wasn't possible before.
I wonder if Rook backed by CephFS would support limitations of storage while also not consuming more than is available. Hmmm, would it be possible to have 100GB total available storage for users that is limited but not guaranteed to 10GB per user? If so, users could get out of disk space errors for two reasons, one being their limit is reached and one is that the net storage available for all users were capped.
Ah, fun fun fun fun fun :D
TLDR is that enforcing quotas is impossible with EFS / Filestore, but there is hope.
Longer version!
@consideRatio saying:
Storage limits probably need feedback directly from the write request, which would be from the NFS server itself, because only the NFS server itself knows how much storage is used. Hmmm... I fear that the NFS server won't know what user (k8s pod / docker container) is reading / writing to it, but only what k8s node or similar.
is accurate, storage quotas need to be set on the NFS file server, not on the pods themselves. /etc/fstab
and similar inside containers basically have no meaning, since the container has no control over anything that gets mounted.
There are two ways to set quotas - via user/groups ID, or via directory name.
By user / group id is the common way, supported by most file systems (including ext4, the default). This is what is mentioned by the article @scottyhq pointed out. If we can set this up on the NFS Server, it would work if we can make sure that each user has a separate uid. However, right now, all our users run with the same uid (1000), so this is not possible.
Via directory name is more useful for us, and is supported in fewer file systems - XFS being the most common. It's possible btrfs and ZFS support it, but I'm not sure? This would also need to run on the NFS server, and would require we write something that maintains the quotas for each directory.
Both these options require we run our own NFS server, rather than use EFS / Filestore - we need to fiddle with the NFS server filesystems, which we can't really do in these managed offerings.
Possible next steps here:
If anyone wants to put time and effort into this, I'm happy to point people in directions 🍡
@yuvipanda wieee a condensed knowledge candy post :D I love it! Thanks for the writeup!!! This will be a post I'll read and consider multiple times.
I have this working now for a hub I run!
I'll post code shortly
https://github.com/yuvipanda/get-quota-the-home/blob/master/generate.py is the script I have running on the NFS server, and it does the job. More work needed, but I think it should be a nice and fairly resilient solution.
This does require running our own NFS server though. It should be possible / easy to do this inside the kubernetes cluster itself.
I actually forgot that https://github.com/kubernetes-incubator/external-storage/tree/master/nfs already supports quotas! I'd really love for someone to try that out, means no work on our end.
Woooooo! Nice exploratory work and implementation @yuvipanda!!!
How did you get yourself XFS storage? Did you define a GCE PD storageClass
with a fstype
requesting xfs? Reference: https://kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd
I have never used the nfs-provisioner myself and have some learning to do still. I see that if one use the nfs-provisioner Helm chart, a new storage class is created, hmmm... Does that mean that users of this NFS server would create PVCs referencing that storage class and request storage which is then gets quota limits?
Does the underlying nfs-provisioner quota implementation logic require XFS, or would ext4 be fine as well? It may be troublesome to get XFS for storage unless one is at GCP it seems from the previous link. Hmmm, it seems like it depends on XFS as indicated here.
Install the NFS provisioner, and configure its own persistent storage be backed by XFS.
The chart mounts a Persistent Volume volume at this location. The volume can be created using dynamic volume provisioning. However, it is highly recommended to explicitly specify a storageclass to use rather than accept the clusters default, or pre-create a volume for each replica.
# nfs-provisioner Helm chart config persistence: enabled: true storageClass: "xfs-ssd" size: 200Gi
# xfs-storageclass.yaml that we manually install alongside
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: xfs-ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
fstype: xfs
We use the nfs-provisioner Helm charts's created storageClass to get NFS storage.
# nfs-provisioner Helm chart config
storageClass:
provisionerName: cluster.local/nfs
# JupyterHub helm chart config to use the NFS storage
# https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml#L283
singleuser:
storage:
dynamic:
storageClass: cluster.local/nfs
Step 2 means that a new PVC will be created for each user, while typically I've used a single PVC pointing to a NFS server, and let each pod mount a different folder path.
We create a pod with a container or sidecar container in the NFS server pod which mounts the XFS storage, and then this XFS storage is monitored to update quotas using the XFS CLI called xfs_quota
.
To inspect the file system used in a folder, one can run stat --file-system --format=%T /home/jovyan
which will output nfs
, overlayfs
, or xfs
etc.
How did you get yourself XFS storage? Did you define a GCE PD
storageClass
with afstype
requesting xfs? Reference: kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd
I just have a separate NFS VM that has a disk formatted as XFS (with mkfs.xfs
command). This is what we have at berkeley right now, not ideal.
Does that mean that users of this NFS server would create PVCs referencing that storage class and request storage which is then gets quota limits?
This is my understanding!
# JupyterHub helm chart config to use the NFS storage # https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml#L283 singleuser: storage: dynamic: storageClass: cluster.local/nfs
You should be able to set singleuser.storage.capacity
here, and have that made available as quota.
To inspect the file system used in a folder, one can run
stat --file-system --format=%T /home/jovyan
which will outputnfs
,overlayfs
, orxfs
etc.
This is great to know! I mostly just run mount
which provides the same information. With quotas, mount
actually provides me the quota'd capacity, not the total NFS capacity!
I'd be interested in both of these:
Figure out how to give each user their own unique uid / gid, so we can use uid based quotas. This also brings with it other advantages - better security, and more traditional ways to share files between users.
General benefits for our cloud offerings are always nice.
Consider running the NFS Server in the k8s cluster itself. This lets us customize it better, possibly using XFS + project quotas as sidecars.
Always want to run more infrastructure! It would probably be cheaper than a managed offering and we can integrate it with Terraform!
I'll do some reading up on these things this week, but if you have some pointers (besides the link in the second point), I'd love them.
I'm really excited about this work in general, it feels to me like a proper solution to a long standing functionality issue (storage quotas for NFS servers) and cost issue (Google's managed NFS called Filestore for example is expensive for smaller deployments).
I'm convinced now that the solution is https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner
I spent some time reading up, and this is my summary of whats important to overview the situation I think.
NFS Ganesha is a modern open source NFS server.
NFS Server Provisioner was a kubernetes project for a Kubernetes Volume Provisioner backed up by a NFS Ganesha server. So, a Volume provisioner is the thing you would reference from a k8s StorageClass
resource, which in turn is what a PVCs would reference with storageClassName
, which in turn a Pod would reference to mount storage. This repository maintains a Dockerimage published to quay.io/kubernetes_incubator/nfs-provisioner
I think.
The NFS Server Provisioner project had two associated Helm charts, the nfs-server-provisioner and the nfs-client-provisioner. The nfs-client-provisioner
Helm chart is a slimmed version of the other, excluding the deployment of the actual NFS server.
As the NFS Server Provisioner resided kubernetes-incubator/external-storage, and the GitHub org kubernetes-incubator is now kubernetes-retired, they moved the NFS Server Provisioner part of the external-storage repo to kubernetes-sigs/nfs-ganesha-server-and-external-provisioner. The associated Helm charts have not migrated though.
We want to use https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner and an associated Helm chart, but the latest version of a Helm chart is the nfs-server-provisioner.
I suggest to explore deploying a nfs-server-provisioner
Helm chart on GKE, where we make the NFS server's own storage be created through a manually created StorageClass like the default GKE storage class but with fsType: xfs
instead of fsType: ext4
. We would configure the Helm chart to consume our custom StorageClass with persistence.storageClass
.
It would also be cool to explore if there is some Prometheus metrics exposed that we can consume.
As provisioning XFS storage is something that depends on the cloud provider quite a bit, I think to deploy a NFS server with XFS etc in k8s will require some cloud provider lock in - some custom steps for the different cloud providers at least.
It may be a more cloud agnostic set of instructions we can develop if we provision a VM with a NFS Ganesha and install nfs-client-provisioner
in the k8s cluster. I fear that only GKE can support XFS backed storage for the in-k8s-cluster deployed NFS server if we want to use nfs-server-provisioner
and not involve a standalone VM.
The helm chart for nfs-server-provisioner is now maintained in https://github.com/kvaps/nfs-server-provisioner-chart, and the one in helm/charts is no longer the best option. https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner/pull/13 is about merging that back where it should reside.
Hi all - I was pointed to this issue by a friend. I haven't fully grokked all of the discussion, but I wanted to point out that while project quotas originated with XFS and are well supported there, current ext4 should support them just fine as well - support was added to ext4 circa 2016, in kernel version 4.5.
[...] while project quotas originated with XFS and are well supported there, current ext4 should support them just fine as well - support was added to ext4 circa 2016, in kernel version 4.5.
Hmmm GKEs linux kernel will be more modern than that, so then why was there ever need to go XFS when @yuvipanda explored this hmmm... Is this about the NFS servers ability to use the filesystems quotasystem? Does NFS ganesha not support the modern ext4 on linux 4.5 ability to use quotas?
Ah it seems so, NFS Ganesha describes FSALs, file system abstraction layers - and the fact that im missing ext4 from this list makes me guess i may be... guessing in the right direction?
@ianabc is it okay for me to copy paste what you wrote on slack here about various cloud providers and XFS filesystem storage?
Hi Erik,
Of course, please do!
-Ian
On Thu, Sep 24, 2020 at 6:34 PM Erik Sundell notifications@github.com wrote:
@ianabc https://github.com/ianabc is it okay for me to copy paste what you wrote on slack here about various cloud providers and XFS filesystem storage?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-cloud-federation/issues/654#issuecomment-698671810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN7FWGPER4UIX43E5KOWPDSHPXSHANCNFSM4OV4GZNA .
Great investigative work all! This is really exciting. We've been beating our heads trying to figure out how to enforce per-user quotas (currently using EFS).
I'm looking forward to seeing if there is any traction on implementation of some of these ideas.
It might not be 100% relevant but I was experimenting with an NFS server based on ZFS for the same reason (on AKS). Ultimately it was stand-alone and just used the client provisioner but it worked OK
Maybe more related, I think I can do what @Erik Sundell
is suggesting in AWS, I had something similar in another project. If I do
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: xfs-storage-class
provisioner: kubernetes.io/aws-ebs
parameters:
fstype: xfs
Then if I deploy nfs-server-provisioner with
persistence:
enabled: true
storageClass: "xfs-storage-class"
size: 10Gi
storageClass:
defaultClass: true
I can use it with e.g.
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: storage-demo
spec:
storageClassName: "nfs"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
---
apiVersion: v1
kind: Pod
metadata:
name: storage-demo
spec:
volumes:
- name: storage-demo
persistentVolumeClaim:
claimName: storage-demo
containers:
- name: storage-demo
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: storage-demo
And the same thing seems to work on AKS with
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: xfs-storage-class
provisioner: kubernetes.io/azure-disk
parameters:
fstype: xfs
Is there any update on this discussion? How can we restrict aws efs data usage per kubernetes pod? @consideRatio @ianabc @yuvipanda
This is a problem that has been present on Pangeo JupyterHubs for a while. JupyterHub home directories are generally backed by an NFS of some sort; the GCP hubs use Google FileStore and the AWS hubs use AWS EFS. However, there is not yet a way to enable limits for any individual user's storage on the JupyterHub; any user can make their home directory about as big as the entire NFS. We would like to solve this.
As far as AWS EFS goes, under the FAQ section for the EFS-Provisioner, which we use to fill PVCs for new users logging in, it says
At least for the AWS hubs, the solution will need to come before this step (looking at "Every pod accessing EFS will have unlimited storage"). This will be beneficial I think, because then the solution on AWS should be almost if not identical to the solution on GCP.
Currently, I've been looking into this article: How To Set Filesystem Quotas on Ubuntu 18.04 since all the pangeo Docker images start from Ubuntu 18.04.
I've installed some extra
apt
packages (quota
andlinux-image-extra-virtual
) on my own Docker image (hosted here for testing and deployed on http://staging.icesat-2.hackweek.io/ ). Both of the test commands in the first two steps work. The third step is to modify the/etc/fstab
file to activate quotas by mounting filesystems with quota-related options. However, the/etc/fstab
file only containsso I am suspicious about the third step working. If the file isn't configured, then changing it feels like it won't have any effect since it didn't beforehand. This gets into some questions about if / how we can make the user pods mount the home directory in a way that uses this file. My impression is that if we could get to a point where we use that file, it would be easy to modify and get user quotas established, albeit in a hacky, linux way.
If @consideRatio , @yuvipanda , or others have thoughts on this, I'd love to hear them.
Ping @scottyhq , @jhamman , @rabernat for their previous interest in this problem.