I worked for a long time to make improvements wherever I saw them, but that ended me with a too big PR with no focus. I'll close this issue when I have extracted everything of use from that PR. Below are various parts I want to extract from the PR.
Developer PRs
[x] Improved linting and validation + CI integration - #844
A test that can run before chartpress builds images etc, allowing us to spot faulty Helm templates and if they render in a way that makes them invalid k8s resources. Also some less important linting is made on the helm templates and the rendered output.
Helm template linting with helm lint
Helm template verification with helm template
Rendered output linting with yamllint
Rendered output validation against the k8s API schema with kubeval
[x] Improved CI tests - #846
Simplifies local development and ensures a user is at least spawned correctly before signing off on the PR.
New feature PRs
[x] Configurable image pull secrets - #851
@AlexMorreale helped us use config.yaml to install a Kubernetes Secret which could be referenced by pods that want to pull from image registries that required some credentials - this is #801.
851 complements Alex work by allowing for the email field to also be configured optionally, which seem to be utilized by some registries but not all.
851 also adds support for setting large JSON blobs in the password field, something that is required in order to pull from a private gcr.io registry from an external cluster.
851 also ensures that the image puller DaemonSets have the same credentials to pull the images.
[x] User scheduler - #891
Want to make your autoscheduler work efficiently? Then you should schedule pods to pack tight instead of spread out. The user scheduler accomplishes this.
Note: the schedulerStrategy: pack that meant to accomplish this, but it did not work as intended and only became a minor improvement. The configuration option will be deprecated with the introduction of the user scheduler.
[x] Allow setting of storage labels - #924
This allows labels on the PVCs to be configured. This can be useful if you want to delete, patch etc all user PVCs or similar, as it can help with the kubectl selection part.
[x] Tolerations for node taints - #925
Want to forbid anything from executing on a node except for a user pod, to ensure it can scale down without fail when the users have left? Then with this PR in place, you can taint that node with hub.jupyter.org/dedicated=user:NoSchedule and the user pods will tolerate scheduling there anyhow. There is also a similar thing you can do but for the core pods.
Note that until GKE has fixed [a bug](gcloud beta container node-pools create test-pool --machine-type ns-standard-1 --num-nodes 0 --node-taints hub.jupyter.org/dedicated=user:NoSchedule), their node pools cant have a taint with a / in it, but this PR will make users tolerate that / has been replaced with _.
[x] Node affinity and node labels - #926
Want to ensure a good autoscaling experience? Create two node pools, one for core pods and one for user pods. Core pods are the hub, proxy and user-scheduler pods. When that is done, it will be easy to configure.
[x] Making the core and user pods affinity have configurable presets - #927
Want to be able to configure KubeSpawner's affinity settings introduced in https://github.com/jupyterhub/kubespawner/pull/239 with common useful presets? This is what this PR makes you able to do.
[x] Pod priority and User placeholders - #929
Want to scale up before users arrive so they don't end up waiting for the node to pull an image of several gigabytes in size? By adding a configurable fixed amount of user placeholder pods with a lower pod priority than real user pods, we can accomplish this. It requires k8s v1.11 though.
[x] Improved resource template - (the same PR)
The resource template caused various linting errors and forced us to break the DRY principle.
[x] preferScheduleNextToRealUsers - improves autoscaling - #930
This setting slightly improves the ability for a cluster autoscaler to scale down by increasing the likelyhood of user placeholders being left alone on a node rather than real users. Real users can't be moved around while user placeholder pods can, if you are to schedule a real user on one of two nodes, where one contains a user-placeholder and one contains a real user, where should you schedule? Next to the real user! This setting ensures this would happen, and potentially allowing the other node to scale down.
[x] User dummy - #931
Want to test if your autoscaler and user placeholder do what they should? With the user dummy PR you will get tools to do so.
Maintenance PRs
[x] Cleanup of orphaned files - #842
Two files were left unused in the repo.
[x] cull.maxAge bugfix - #853
cull.maxAge previously didn't influence the culler service, as the value was never consumed. This is fixed by a single one line commit in a PR.
[x] No more duplicates of puller pods - #854
Nobody wants pods running that does nothing. By using the new before-hook-creation value for the deletion-policy Helm hook together with a single name for our Helm hook resources, we can ensure never having orphaned image pullers.
[x] Remove pod-culler image - #890 #919
Before JupyterHub 0.9 the pod-culler was a standalone pod with a custom image. But now it is a internal service of the JupyterHub pod, so in this PR we slim the remnant code.
[x] Upgrade to k8s 1.9 APIs - #920
Migrate to more stable K8s resource APIs from beta.
[x] Update of the singleuser-sample image - #888
git and nbgitpuller are now available by default
Switch to using a StatefulSet for the Hub*
The Hub should perhaps be a StatefulSet rather than a Deployment as it tends to be tied to a PV that can only be mounted by one single Hub. See this issue: https://github.com/helm/charts/issues/1863* I have second thoughts about this, does this make sense even though we use an external hub db? Marking this as complete.
About #758
I worked for a long time to make improvements wherever I saw them, but that ended me with a too big PR with no focus. I'll close this issue when I have extracted everything of use from that PR. Below are various parts I want to extract from the PR.
Developer PRs
helm lint
helm template
yamllint
kubeval
New feature PRs
851 complements Alex work by allowing for the email field to also be configured optionally, which seem to be utilized by some registries but not all.
851 also adds support for setting large JSON blobs in the password field, something that is required in order to pull from a private gcr.io registry from an external cluster.
851 also ensures that the image puller DaemonSets have the same credentials to pull the images.
[x] User scheduler - #891 Want to make your autoscheduler work efficiently? Then you should schedule pods to pack tight instead of spread out. The user scheduler accomplishes this.
Note: the
schedulerStrategy: pack
that meant to accomplish this, but it did not work as intended and only became a minor improvement. The configuration option will be deprecated with the introduction of the user scheduler.[x] Tolerations for node taints - #925 Want to forbid anything from executing on a node except for a user pod, to ensure it can scale down without fail when the users have left? Then with this PR in place, you can taint that node with
hub.jupyter.org/dedicated=user:NoSchedule
and the user pods will tolerate scheduling there anyhow. There is also a similar thing you can do but for the core pods.Note that until GKE has fixed [a bug](gcloud beta container node-pools create test-pool --machine-type ns-standard-1 --num-nodes 0 --node-taints hub.jupyter.org/dedicated=user:NoSchedule), their node pools cant have a taint with a
/
in it, but this PR will make users tolerate that/
has been replaced with_
.Maintenance PRs
cull.maxAge
previously didn't influence the culler service, as the value was never consumed. This is fixed by a single one line commit in a PR.before-hook-creation
value for thedeletion-policy
Helm hook together with a single name for our Helm hook resources, we can ensure never having orphaned image pullers.beta
.git
andnbgitpuller
are now available by defaultDocumentation PRs