pangeo-gallery / default-binder

Default binder environment for pangeo gallery.
5 stars 5 forks source link

failing to start cluster using binder #22

Closed jkingslake closed 3 years ago

jkingslake commented 3 years ago

When I run this notebook in the Pangeo default binder using

https://binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/2020.10.10/?urlpath=git-pull?repo=https://github.com/ldeo-glaciology/pangeo-glaciology-examples.git%26amp%3Burlpath=lab/tree/pangeo-glaciology-examples.git/03_REMA.ipynb

it fails to open a cluster and says:

GatewayClusterError: Cluster 'prod.e8cbb9497deb4771b8fcdcffb50c19fd' failed to start, see logs for more information

(full error below)

I will share the logs if someone can give me a pointer to where to find these.

The notebook runs successfully on us-central1-b.gcp.pangeo.io with the smallest server.

Thanks for any help on this.

from dask.distributed import Client
import dask_gateway
gateway = dask_gateway.Gateway()
cluster = gateway.new_cluster()

---------------------------------------------------------------------------
GatewayClusterError                       Traceback (most recent call last)
<ipython-input-2-de2b60f93eb6> in <module>
      2 import dask_gateway
      3 gateway = dask_gateway.Gateway()
----> 4 cluster = gateway.new_cluster()

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in new_cluster(self, cluster_options, shutdown_on_close, **kwargs)
    641             cluster_options=cluster_options,
    642             shutdown_on_close=shutdown_on_close,
--> 643             **kwargs,
    644         )
    645 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in __init__(self, address, proxy_address, public_address, auth, cluster_options, shutdown_on_close, asynchronous, loop, **kwargs)
    816             shutdown_on_close=shutdown_on_close,
    817             asynchronous=asynchronous,
--> 818             loop=loop,
    819         )
    820 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in _init_internal(self, address, proxy_address, public_address, auth, cluster_options, cluster_kwargs, shutdown_on_close, asynchronous, loop, name)
    912             self.status = "starting"
    913         if not self.asynchronous:
--> 914             self.gateway.sync(self._start_internal)
    915 
    916     @property

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in sync(self, func, *args, **kwargs)
    337             )
    338             try:
--> 339                 return future.result()
    340             except BaseException:
    341                 future.cancel()

/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    433                 raise CancelledError()
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:
    437                 raise TimeoutError()

/srv/conda/envs/notebook/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in _start_internal(self)
    926             self._start_task = asyncio.ensure_future(self._start_async())
    927         try:
--> 928             await self._start_task
    929         except BaseException:
    930             # On exception, cleanup

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in _start_async(self)
    944         # Connect to cluster
    945         try:
--> 946             report = await self.gateway._wait_for_start(self.name)
    947         except GatewayClusterError:
    948             raise

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask_gateway/client.py in _wait_for_start(self, cluster_name)
    576                     raise GatewayClusterError(
    577                         "Cluster %r failed to start, see logs for "
--> 578                         "more information" % cluster_name
    579                     )
    580                 elif report.status is ClusterStatus.STOPPED:

GatewayClusterError: Cluster 'prod.e8cbb9497deb4771b8fcdcffb50c19fd' failed to start, see logs for more information
jkingslake commented 3 years ago

@rabernat, not sure if things are in flux with this site and the binder image, but just wondering if you saw this.

rabernat commented 3 years ago

Yes, our binder appears to be broken

rabernat commented 3 years ago

So I tried this with a newer image (a095b36) and it seems to be working:

https://binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/a095b36/?urlpath=git-pull?repo=https://github.com/ldeo-glaciology/pangeo-glaciology-examples.git%26amp%3Burlpath=lab/tree/pangeo-glaciology-examples.git/03_REMA.ipynb

and it seems to be working.

So the problem is that the dask gateway in that image was incompatible with current dask gateway installation.

jkingslake commented 3 years ago

nice one, thanks Ryan!