Z2JK v1.2.0: user-placeholder pods and single user pods go into a pending state and are never scheduled on a node in their designated GKE K8s node-pool #2576
Z2JK v1.2.0: user-placeholder pods and single user pods go into a pending state and are never scheduled on a node in their designated GKE K8s node-pool
Expected behaviour
user-placeholder pods and single user pods are scheduled and in a running state on a node in their designated GKE K8s node-pool
Actual behaviour
Core pods (hub, proxy, image-awaiter) are scheduled and running on a node in their designated GKE K8s node-pool.
Continuous Image Puller and Hook image Puller are scheduled on all nodes that are labeled and tainted and tainted as user nodes (same node pool where user-placeholder pods and single user pods should be running.)
User-Placeholder pod is in a perpetual state of Pending and are not scheduled to a node in their designated node-pool -- the same nodes and node-pool where the Continuous Image Puller and Hook Image Puller are running . User-Placeholder stateful set is in a perpetual state of "in-progress".
Single User pods go into pending state and are not scheduled to a node in their designated node-pool -- the same nodes and node-pool where the Continuous Image Puller and Hook Image Puller are running.
How to reproduce
Create GKE cluster with two node-pools -- one node pool for the core pods and another node pool for the user pods. Make sure the two node-pools are tainted with distinct taints that differentiates "core pods" nodes and "user pods" nodes. Make sure to create an auto-scaling group for the "user pods" node-pool.
Modify the Z2JK v1.2.0 values.yaml file to create 2 User-Placeholder replicas per "user pods" node.
Modify the Z2JK v1.2.0 values.yaml file (core-pods and user-pods section with the tolerations for the core pods node-pool and the tolerations for the user pods node pool.
Run the Z2JK v1.2.0 helm chart to deploy Z2JK v1.2.0.
You should notice that the User-Placeholder pods go into a perpetual pending state without being scheduled on a node.
Login into Jupyter Hub and launch a user's Jupyter Notebook server (e.g., Single User pod).
Single User pod goes into a pending state and is not assigned to a node. It eventually is terminated by Jupyterhub.
Your personal set up
OS:
Version(s):
Full environment
# paste output of `pip freeze` or `conda list` here
Bug description
Z2JK v1.2.0: user-placeholder pods and single user pods go into a pending state and are never scheduled on a node in their designated GKE K8s node-pool
Expected behaviour
user-placeholder pods and single user pods are scheduled and in a running state on a node in their designated GKE K8s node-pool
Actual behaviour
Core pods (hub, proxy, image-awaiter) are scheduled and running on a node in their designated GKE K8s node-pool.
Continuous Image Puller and Hook image Puller are scheduled on all nodes that are labeled and tainted and tainted as user nodes (same node pool where user-placeholder pods and single user pods should be running.)
User-Placeholder pod is in a perpetual state of Pending and are not scheduled to a node in their designated node-pool -- the same nodes and node-pool where the Continuous Image Puller and Hook Image Puller are running . User-Placeholder stateful set is in a perpetual state of "in-progress".
Single User pods go into pending state and are not scheduled to a node in their designated node-pool -- the same nodes and node-pool where the Continuous Image Puller and Hook Image Puller are running.
How to reproduce
Create GKE cluster with two node-pools -- one node pool for the core pods and another node pool for the user pods. Make sure the two node-pools are tainted with distinct taints that differentiates "core pods" nodes and "user pods" nodes. Make sure to create an auto-scaling group for the "user pods" node-pool.
Modify the Z2JK v1.2.0 values.yaml file to create 2 User-Placeholder replicas per "user pods" node.
Modify the Z2JK v1.2.0 values.yaml file (core-pods and user-pods section with the tolerations for the core pods node-pool and the tolerations for the user pods node pool.
Run the Z2JK v1.2.0 helm chart to deploy Z2JK v1.2.0.
You should notice that the User-Placeholder pods go into a perpetual pending state without being scheduled on a node.
Login into Jupyter Hub and launch a user's Jupyter Notebook server (e.g., Single User pod).
Single User pod goes into a pending state and is not assigned to a node. It eventually is terminated by Jupyterhub.
Your personal set up
Full environment
Configuration
Logs