Closed chrisroat closed 3 years ago
This is out of control of this helm chart to control, and the responsibility of the cluster autoscaler based on the unscheduled pod.
You can use a k8s Pod's nodeSelector field to force the pod to schedule on some node with a specific label or similar though.
OK. Thanks. I wasn't sure if the user scheduler (or placeholder) pieces would have done something like this.
I'll check out the GKE autoscaling logs.
No worries!
Note that the user-placeholder pieces will use the default configuration (anything configured not in the singleuser.profileList / c.KubeSpawner.profile_list section) to define its k8s Pod specification. Then, it becomes the responsibility of the cluster autoscaler to do its work in a way that it considers makes the most sense given that pod specification.
Hi there @chrisroat :wave:!
I closed this issue because it was labelled as a support question.
Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.
Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.
Thanks you for being an active member of our community! :heart:
Bug description
In some circumstances, deploying a small profile spins up a node with large resources, even if nodes with smaller resources exists.
Expected behaviour
Small profiles should not spin up large machines when small machines can be spun up.
Actual behaviour
Using the singleuser.profileList, I have defined several different options with varying cpu/memory/gpu overrides. I also have a few different specs of GKE node pools on which the jupyter instances will deploy.
I have found that initially deploying a small profile will spin up a small machine. Good!
However, once a user has deployed a large profile (e.g., 64 core, several GPUs, and TB or memory), later launches of small instances (say, simply 4 core / 20GB) will spin up additional big machines. The smaller profiles do pack onto those machines, but it's still pretty inefficient and costly over time.
I have tried updating after removing podPriority and userPlaceholder stanzas, but the behavior does not seem to change.
Your personal set up
GKE cluster is running 1.17.14-gke.400.
Installed using daskhub 4.5.6, which uses jupyterhub chart 0.10.6