Restricting user's resources

mrocklin commented 5 years ago

Currently any user on our JupyterHub+Dask deployments can launch as many pods as they like. This is troublesome because it opens us up to excessive costs and because other users can't easily get on. What is the right way to do this?

Two approaches have come up before:

Place each user in a separate namespace. Some users, notably the UK Met office say that this is unpleasant because Pangeo is only one of many services running on their Kubernetes cluster and they'd like to have it all within a single namespace if possible. This is probably representative of a larger concern about cleanliness polluting namespaces.
Make a separate service that manages everything. Users don't talk to Kubernetes, they talk to that thing, which talks to Kubernetes for them. This is doable, but perhaps larger in scope than we'd like to tackle near-term.
@yuvipanda mentioned that there might be user-based resource quotas in some corner of the expansive Kubernetes world. If he has time it would be good to get links from him.

Also cc @jacobtomlinson @dsludwig

dsludwig commented 5 years ago

This also relates to #135; part of the proposal there is to use separate namespaces to enforce user permissions. If there is only a single namespace, then there's no Kubernetes-native way that I'm aware of to restrict users to only create/delete their own resources.

Caution: System administrators, use care when granting access to pod creation. A user granted permission to create pods (or controllers that create pods) in the namespace can: read all secrets in the namespace; read all config maps in the namespace; and impersonate any service account in the namespace and take any action the account could take. This applies regardless of authorization mode.

tjcrone commented 5 years ago

I wonder if there is a way to use NodeSelectors to make this happen. It is possible to spawn users into specific nodepools, and onto specific nodes. It is also possible to define the nodes/nodepools on which a user's Dask workers launch. Users could potentially override these settings for workers (or maybe not if we figure out a way to prevent this), but regardless, it could be an easy first step to segmenting users into nodepools to prevent the wholesale overtaking of resources. I have been playing a lot with NodeSelectors and they are great.

jhamman commented 5 years ago

The approach that @tjcrone is describing is exactly what I've been doing for the binder deployment. So this means that we can always get notebook pods, even when the dask worker pool is full. I think this is a good first step and seems to be working as a stop gap measure.

mrocklin commented 5 years ago

FWIW I'm personally less concerned about blocking users out of notebooks and more concerned with limiting the damage that any individual can do to the cluster.

On Wed, Sep 5, 2018 at 7:32 PM, Joe Hamman notifications@github.com wrote:

The approach that @tjcrone https://github.com/tjcrone is describing is exactly what I've been doing for the binder deployment. So this means that we can always get notebook pods, even when the dask worker pool is full. I think this is a good first step and seems to be working as a stop gap measure.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/380#issuecomment-418914449, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszHuBE4bE9dxCe5WehbxrMwEo8u7Rks5uYF8fgaJpZM4WboLU .

jhamman commented 5 years ago

Right, they are two separate problems that could share a common fix.

tjcrone commented 5 years ago

Right. I think there is a way to use NodeSelectors to partition users and their workers into nodepools that are restricted in size. Every user could have their own nodepool for workers that is restricted in size, and scales to zero when they are not using workers.

yuvipanda commented 5 years ago

The near-term way to do this would be:

One pod-priority per user / class (https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/). Maybe this is two priorities - one for user notebooks, one for user workers. This handles pre-emption
Use a ResourceQuota object per PriorityClass (https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-priorityclass). This lets us restrict users' worker pods & notebook pods separately. However, this feature is only available in alpha clusters on GKE right now, but should be beta in 1.12, which is probably a month and something away.

dsludwig commented 5 years ago

I've started implementing this the way @yuvipanda describes as a customPodHook (but it could easily be merged into kubespawner).

https://gist.github.com/dsludwig/c845dff55227336aaaef0fc4241864bf

I'm using a Minikube install with 1.11, since it's not available on GKE yet. I'm having trouble finding the correct RBAC to allow JupyterHub to create priority classes, the expected resources don't seem to work (I've included the ones I tried in the Gist).

I'll probably pause this for the moment, at least until I can get access to newer Kubernetes on GKE, or 1.12 releases.

Note this just adds the priority class to the notebook pod, adding it to the Dask workers would be another piece.

jacobtomlinson commented 5 years ago

Just to add my two cents.

As @mrocklin said our Kubernetes cluster is used for more than just Pangeo so it would be a pain to have a namespace per user. However I remember seeing something about namespace hierarchy coming in at some point, @yuvipanda do you know anything about this?
PodPriorities would be good. One thing I've never been sure of is whether a distrubuted dask task can cope with workers being forcibly removed mid calculation. I would like to think that the scheduler would reschedule the lost work and other workers would pick up those tasks. If this is the case each worker could be given an ever decreasing priority which would allow users to disrupt other users workers.
Resource quotas are a thing.
Cluster size limits in AWS are how we manage ours at the moment.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

athornton commented 5 years ago

I'm working around the namespace hierarchy thing with the dumb method of prepending my hub's namespace to my user name for the user namespace. That is, if Hub is in "jupyterlabdemo", then my pods end up in "jupyterlabdemo-athornton".

This is using the NamespacedKubeSpawner implementation at https://github.com/lsst-sqre/namespacedkubespawner.

This class has an overrideable method, get_resource_quota_spec(), which returns a kubernetes.client.V1ResourceQuotaSpec and constructs a namespace quota from that spec. The idea is your subclass should replace that method with something that makes sense in your environment.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jhamman commented 4 years ago

Just as an update on this, we have ended up going the route of:

Make a separate service that manages everything. Users don't talk to Kubernetes, they talk to that thing, which talks to Kubernetes for them. This is doable, but perhaps larger in scope than we'd like to tackle near-term.

This thing is dask-gateway and is now bundled as part of this chart. Resources are limited based on limits set in the dask-gateway configuration: https://gateway.dask.org/resource-limits.html

With this in mind, I'm going to close this now. We can reopen if the need arises.

pangeo-data / helm-chart

Restricting user's resources #80