pangeo-data / pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
https://pangeo.io/cloud.html
57 stars 32 forks source link

Cannot create Scheduler if JupyterHub username isn't a valid label #879

Open TomAugspurger opened 3 years ago

TomAugspurger commented 3 years ago

We add the JupyterHub username as a kubernetes label to the scheduler and worker pods at https://github.com/pangeo-data/pangeo-cloud-federation/blob/41b981403993a305d438cfddce5aa43f9d0ffdd5/pangeo-deploy/values.yaml#L104-L106. This aids in monitoring since we can easily see which JupyterHub user is using which Dask Gateway resources.

In another deployment, I noticed failures to start a scheduler when the JupyterHub username isn't a valid kubernetes label. Perhaps GitHub usernames are always valid so this hasn't come up, but we should set a model here for others copying our configuration options.

2020-11-20T15:28:00.531581654Z [I 2020-11-20 15:28:00.531 KubeController] Reconciling cluster staging.0ff665e7c83144ac8020f691be8c2801
2020-11-20T15:28:00.626229465Z [I 2020-11-20 15:28:00.625 KubeController] Creating new credentials for cluster staging.0ff665e7c83144ac8020f691be8c2801
2020-11-20T15:28:00.659165908Z [I 2020-11-20 15:28:00.658 KubeController] Creating scheduler pod for cluster staging.0ff665e7c83144ac8020f691be8c2801
2020-11-20T15:28:00.677417787Z [W 2020-11-20 15:28:00.676 KubeController] Error while reconciling cluster staging.0ff665e7c83144ac8020f691be8c2801
2020-11-20T15:28:00.677443288Z Traceback (most recent call last):
2020-11-20T15:28:00.677449088Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/controller.py", line 586, in reconciler_loop
2020-11-20T15:28:00.677454788Z     requeue = await self.reconcile_cluster(name)
2020-11-20T15:28:00.677458388Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/controller.py", line 607, in reconcile_cluster
2020-11-20T15:28:00.677462188Z     status_update, requeue = await self.handle_cluster(cluster)
2020-11-20T15:28:00.677465688Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/controller.py", line 643, in handle_cluster
2020-11-20T15:28:00.677469388Z     return await self.handle_pending_cluster(cluster)
2020-11-20T15:28:00.677472588Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/controller.py", line 661, in handle_pending_cluster
2020-11-20T15:28:00.677476188Z     sched_pod_name, sched_pod = await self.create_scheduler_pod_if_not_exists(
2020-11-20T15:28:00.677479588Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/controller.py", line 942, in create_scheduler_pod_if_not_exists
2020-11-20T15:28:00.677495488Z     pod = await self.core_client.create_namespaced_pod(namespace, pod)
2020-11-20T15:28:00.677498888Z   File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/backends/kubernetes/utils.py", line 59, in func
2020-11-20T15:28:00.677502288Z     return await method(*args, **kwargs)
2020-11-20T15:28:00.677505188Z   File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/api_client.py", line 180, in __call_api
2020-11-20T15:28:00.677508388Z     response_data = await self.request(
2020-11-20T15:28:00.677511288Z   File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 229, in POST
2020-11-20T15:28:00.677514888Z     return (await self.request("POST", url,
2020-11-20T15:28:00.677518088Z   File "/usr/local/lib/python3.8/site-packages/kubernetes_asyncio/client/rest.py", line 186, in request
2020-11-20T15:28:00.677521188Z     raise ApiException(http_resp=r)
2020-11-20T15:28:00.677524088Z kubernetes_asyncio.client.exceptions.ApiException: (422)
2020-11-20T15:28:00.677527488Z Reason: Unprocessable Entity
2020-11-20T15:28:00.677531388Z HTTP response headers: <CIMultiDictProxy('Audit-Id': 'a04309e3-dd64-4b8c-b6ca-a45509cab1b1', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 20 Nov 2020 15:28:00 GMT', 'Content-Length': '968')>
2020-11-20T15:28:00.677537888Z HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"dask-scheduler-0ff665e7c83144ac8020f691be8c2801\" is invalid: metadata.labels: Invalid value: \"tom augspurger\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","reason":"Invalid","details":{"name":"dask-scheduler-0ff665e7c83144ac8020f691be8c2801","kind":"Pod","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"tom augspurger\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","field":"metadata.labels"}]},"code":422}
2020-11-20T15:28:00.677548888Z 
2020-11-20T15:28:00.677551788Z 
yuvipanda commented 3 years ago

In kubespawner, we use https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L1395 to help deal with it. Maybe something like that can be added to dask-gateway?

TomAugspurger commented 3 years ago

Thanks, that looks better than the hacky regex I was cooking up :)

dask-gateway may not be the appropriate place for this. In this case, it’s just us (pangeo’s daskhub deployment) adding the label. If that method in kubespawner happened to be public, it’d be perfect for us, since kubespawner would be present on the machine where this code is executed :)

That said, there are dask-gateway deployments that aren’t jupyterhub deployments, who might benefit from a method like this, and wouldn’t have access to kubespawner.

On Nov 23, 2020, at 1:30 AM, Yuvi Panda notifications@github.com wrote:

In kubespawner, we use https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L1395 https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L1395 to help deal with it. Maybe something like that can be added to dask-gateway?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-cloud-federation/issues/879#issuecomment-731979988, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIQF3LI67UTI4TUTPKTSRIFQZANCNFSM4T45OSJQ.

yuvipanda commented 3 years ago

The criteria required is that it's a valid DNS label - https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L1395.

import string
from escapism import escape

safe_chars = set(string.ascii_lowercase + string.digits)
safe_username = escapism.escape(self.user.name, safe=safe_chars, escape_char='-').lower()

This should work!