Small images landing on large machines

chrisroat commented 3 years ago

Bug description

In some circumstances, deploying a small profile spins up a node with large resources, even if nodes with smaller resources exists.

Expected behaviour

Small profiles should not spin up large machines when small machines can be spun up.

Actual behaviour

Using the singleuser.profileList, I have defined several different options with varying cpu/memory/gpu overrides. I also have a few different specs of GKE node pools on which the jupyter instances will deploy.

I have found that initially deploying a small profile will spin up a small machine. Good!

However, once a user has deployed a large profile (e.g., 64 core, several GPUs, and TB or memory), later launches of small instances (say, simply 4 core / 20GB) will spin up additional big machines. The smaller profiles do pack onto those machines, but it's still pretty inefficient and costly over time.

I have tried updating after removing podPriority and userPlaceholder stanzas, but the behavior does not seem to change.

Your personal set up

GKE cluster is running 1.17.14-gke.400.

Installed using daskhub 4.5.6, which uses jupyterhub chart 0.10.6

consideRatio commented 3 years ago

This is out of control of this helm chart to control, and the responsibility of the cluster autoscaler based on the unscheduled pod.

You can use a k8s Pod's nodeSelector field to force the pod to schedule on some node with a specific label or similar though.

chrisroat commented 3 years ago

OK. Thanks. I wasn't sure if the user scheduler (or placeholder) pieces would have done something like this.

I'll check out the GKE autoscaling logs.

consideRatio commented 3 years ago

No worries!

Note that the user-placeholder pieces will use the default configuration (anything configured not in the singleuser.profileList / c.KubeSpawner.profile_list section) to define its k8s Pod specification. Then, it becomes the responsibility of the cluster autoscaler to do its work in a way that it considers makes the most sense given that pod specification.

support[bot] commented 3 years ago

Hi there @chrisroat :wave:!

I closed this issue because it was labelled as a support question.

Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.

Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.

Thanks you for being an active member of our community! :heart:

jupyterhub / zero-to-jupyterhub-k8s