pangeo-data / pangeo-cmip6-examples

Examples of analysis of CMIP6 data using xarray and dask
BSD 3-Clause "New" or "Revised" License
54 stars 23 forks source link

Pangeo binder notebook throws exception when starting cluster #5

Closed ceblanton closed 5 years ago

ceblanton commented 5 years ago

Hello! I work with @balaji-gfdl and @aradhakrishnanGFDL and I'm trying to learn more about Pangeo. I was able to launch the Pangeo binder (https://binder.pangeo.io/v2/gh/pangeo-data/pangeo_cmip6_examples/master) and start a notebook (cmip6_precip_analysis.ipynb or cmip6_PT_analysis.ipynb).

The first cell, which loads xarray, numpy, and matplotlib, runs OK. The second cell, which starts the dask cluster, fails with a python exception. Could someone take a look? I don't think it's only my environment as I asked another user to try and it also failed for them.

Thank you very much, Chris Blanton


---------------------------------------------------------------------------
ApiException                              Traceback (most recent call last)
<ipython-input-2-9ac3597415a5> in <module>
      1 from dask.distributed import Client, progress
      2 from dask_kubernetes import KubeCluster
----> 3 cluster = KubeCluster(n_workers=20)
      4 client = Client(cluster)
      5 cluster

/srv/conda/envs/notebook/lib/python3.6/site-packages/dask_kubernetes/core.py in __init__(self, pod_template, name, namespace, n_workers, host, port, env, auth, **kwargs)
    239         if n_workers:
    240             try:
--> 241                 self.scale(n_workers)
    242             except Exception:
    243                 self.cluster.close()

/srv/conda/envs/notebook/lib/python3.6/site-packages/dask_kubernetes/core.py in scale(self, n)
    394         pods = self._cleanup_terminated_pods(self.pods())
    395         if n >= len(pods):
--> 396             self.scale_up(n, pods=pods)
    397             return
    398         else:

/srv/conda/envs/notebook/lib/python3.6/site-packages/dask_kubernetes/core.py in scale_up(self, n, pods, **kwargs)
    473                     new_pods.append(
    474                         self.core_api.create_namespaced_pod(
--> 475                             self.namespace, self.pod_template
    476                         )
    477                     )

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py in create_namespaced_pod(self, namespace, body, **kwargs)
   6113             return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
   6114         else:
-> 6115             (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
   6116             return data
   6117 

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py in create_namespaced_pod_with_http_info(self, namespace, body, **kwargs)
   6204                                         _preload_content=params.get('_preload_content', True),
   6205                                         _request_timeout=params.get('_request_timeout'),
-> 6206                                         collection_formats=collection_formats)
   6207 
   6208     def create_namespaced_pod_binding(self, name, namespace, body, **kwargs):

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    332                                    body, post_params, files,
    333                                    response_type, auth_settings,
--> 334                                    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    335         else:
    336             thread = self.pool.apply_async(self.__call_api, (resource_path, method,

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    166                                      post_params=post_params, body=body,
    167                                      _preload_content=_preload_content,
--> 168                                      _request_timeout=_request_timeout)
    169 
    170         self.last_response = response_data

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    375                                          _preload_content=_preload_content,
    376                                          _request_timeout=_request_timeout,
--> 377                                          body=body)
    378         elif method == "PUT":
    379             return self.rest_client.PUT(url,

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/rest.py in POST(self, url, headers, query_params, post_params, body, _preload_content, _request_timeout)
    264                             _preload_content=_preload_content,
    265                             _request_timeout=_request_timeout,
--> 266                             body=body)
    267 
    268     def PUT(self, url, headers=None, query_params=None, post_params=None, body=None, _preload_content=True,

/srv/conda/envs/notebook/lib/python3.6/site-packages/kubernetes/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    220 
    221         if not 200 <= r.status <= 299:
--> 222             raise ApiException(http_resp=r)
    223 
    224         return r

ApiException: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f81ccb55-ffa2-4413-9b4d-cda5ca6736b4', 'Content-Type': 'application/json', 'Date': 'Wed, 31 Jul 2019 20:29:29 GMT', 'Content-Length': '1829'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1jh7fr\" is invalid: [metadata.generateName: Invalid value: \"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), metadata.name: Invalid value: \"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1jh7fr\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]","reason":"Invalid","details":{"name":"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1jh7fr","kind":"Pod","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.generateName"},{"reason":"FieldValueInvalid","message":"Invalid value: \"dask-pangeo-data-pan-_cmip6_examples-it30r1l7-c81e8998-1jh7fr\": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","field":"metadata.name"}]},"code":422}
jhamman commented 5 years ago

I think the underscores in this repo name are making kubernetes unhappy. I think the easiest fix is likely to require renaming this repo. We can also change the dask config to not use {JUPYTERHUB_USER}. @rabernat - do you have any ideas here?

jhamman commented 5 years ago

I've experimented with a few things here today. Renaming the repo from pangeo_cmip6_examples to pangeo-cmip6-examples seems to have worked but I'm not sure this is an acceptable solution. It is easy to roll back but for now, pods do seem to launch.

rabernat commented 5 years ago

Thanks for looking into this. I seem to recall the same issue in pangeo_ocean_examples (which was renamed pangeo-ocean-examples for is reasons).

jhamman commented 5 years ago

@rabernat - are you okay with me renaming this repo? If so, we can close this issue.

rabernat commented 5 years ago

Sure no problem! Github will redirect the old URLs anyway.