malariagen / datalab

Repo for files and issues related to cloud deployment of JupyterHub.
MIT License
0 stars 1 forks source link

GCP deployment not running same image for notebook and worker pods #53

Closed alimanfoo closed 5 years ago

alimanfoo commented 5 years ago

Running on GCP deployment (datalab.malariagen.net):

from dask_kubernetes import KubeCluster
cluster = KubeCluster(n_workers=10)
from dask.distributed import Client
client = Client(cluster)
client.get_versions(check=True)

...I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-4a95fbecd8f8> in <module>
----> 1 client.get_versions(check=True)

/opt/conda/lib/python3.6/site-packages/distributed/client.py in get_versions(self, check, packages)
   3216                 raise ValueError("Mismatched versions found\n"
   3217                                  "\n"
-> 3218                                  "%s" % ('\n\n'.join(errs)))
   3219 
   3220         return result

ValueError: Mismatched versions found

blosc
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | 1.6.2   |
| tcp://10.8.0.59:34609  | 1.4.4   |
| tcp://10.8.1.4:33755   | 1.4.4   |
| tcp://10.8.1.5:46769   | 1.4.4   |
| tcp://10.8.2.2:38271   | 1.4.4   |
| tcp://10.8.2.3:39645   | 1.4.4   |
| tcp://10.8.3.4:33093   | 1.4.4   |
| tcp://10.8.3.5:37181   | 1.4.4   |
| tcp://10.8.32.46:44427 | 1.4.4   |
| tcp://10.8.4.4:38261   | 1.4.4   |
| tcp://10.8.4.5:42849   | 1.4.4   |
+------------------------+---------+

bokeh
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | 1.0.2   |
| tcp://10.8.0.59:34609  | 0.13.0  |
| tcp://10.8.1.4:33755   | 0.13.0  |
| tcp://10.8.1.5:46769   | 0.13.0  |
| tcp://10.8.2.2:38271   | 0.13.0  |
| tcp://10.8.2.3:39645   | 0.13.0  |
| tcp://10.8.3.4:33093   | 0.13.0  |
| tcp://10.8.3.5:37181   | 0.13.0  |
| tcp://10.8.32.46:44427 | 0.13.0  |
| tcp://10.8.4.4:38261   | 0.13.0  |
| tcp://10.8.4.5:42849   | 0.13.0  |
+------------------------+---------+

cloudpickle
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | 0.6.1   |
| tcp://10.8.0.59:34609  | 0.5.5   |
| tcp://10.8.1.4:33755   | 0.5.5   |
| tcp://10.8.1.5:46769   | 0.5.5   |
| tcp://10.8.2.2:38271   | 0.5.5   |
| tcp://10.8.2.3:39645   | 0.5.5   |
| tcp://10.8.3.4:33093   | 0.5.5   |
| tcp://10.8.3.5:37181   | 0.5.5   |
| tcp://10.8.32.46:44427 | 0.5.5   |
| tcp://10.8.4.4:38261   | 0.5.5   |
| tcp://10.8.4.5:42849   | 0.5.5   |
+------------------------+---------+

dask
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | 1.0.0   |
| tcp://10.8.0.59:34609  | 0.19.0  |
| tcp://10.8.1.4:33755   | 0.19.0  |
| tcp://10.8.1.5:46769   | 0.19.0  |
| tcp://10.8.2.2:38271   | 0.19.0  |
| tcp://10.8.2.3:39645   | 0.19.0  |
| tcp://10.8.3.4:33093   | 0.19.0  |
| tcp://10.8.3.5:37181   | 0.19.0  |
| tcp://10.8.32.46:44427 | 0.19.0  |
| tcp://10.8.4.4:38261   | 0.19.0  |
| tcp://10.8.4.5:42849   | 0.19.0  |
+------------------------+---------+

dask_ml
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | None    |
| tcp://10.8.0.59:34609  | 0.9.0   |
| tcp://10.8.1.4:33755   | 0.9.0   |
| tcp://10.8.1.5:46769   | 0.9.0   |
| tcp://10.8.2.2:38271   | 0.9.0   |
| tcp://10.8.2.3:39645   | 0.9.0   |
| tcp://10.8.3.4:33093   | 0.9.0   |
| tcp://10.8.3.5:37181   | 0.9.0   |
| tcp://10.8.32.46:44427 | 0.9.0   |
| tcp://10.8.4.4:38261   | 0.9.0   |
| tcp://10.8.4.5:42849   | 0.9.0   |
+------------------------+---------+

distributed
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | 1.25.1  |
| tcp://10.8.0.59:34609  | 1.23.0  |
| tcp://10.8.1.4:33755   | 1.23.0  |
| tcp://10.8.1.5:46769   | 1.23.0  |
| tcp://10.8.2.2:38271   | 1.23.0  |
| tcp://10.8.2.3:39645   | 1.23.0  |
| tcp://10.8.3.4:33093   | 1.23.0  |
| tcp://10.8.3.5:37181   | 1.23.0  |
| tcp://10.8.32.46:44427 | 1.23.0  |
| tcp://10.8.4.4:38261   | 1.23.0  |
| tcp://10.8.4.5:42849   | 1.23.0  |
+------------------------+---------+

lz4
+------------------------+---------+
|                        | version |
+------------------------+---------+
| client                 | None    |
| tcp://10.8.0.59:34609  | 1.1.0   |
| tcp://10.8.1.4:33755   | 1.1.0   |
| tcp://10.8.1.5:46769   | 1.1.0   |
| tcp://10.8.2.2:38271   | 1.1.0   |
| tcp://10.8.2.3:39645   | 1.1.0   |
| tcp://10.8.3.4:33093   | 1.1.0   |
| tcp://10.8.3.5:37181   | 1.1.0   |
| tcp://10.8.32.46:44427 | 1.1.0   |
| tcp://10.8.4.4:38261   | 1.1.0   |
| tcp://10.8.4.5:42849   | 1.1.0   |
+------------------------+---------+

etc.

I think this means that a different (older) image is being used for the dask workers, but it should be the same image as used for the notebook pod.

Currently this is preventing running any distributed computations.

alimanfoo commented 5 years ago

cc @slejdops

slejdops commented 5 years ago

I’m fixing this now.

alimanfoo commented 5 years ago

Awesome, thanks.

On Wed, 23 Jan 2019 at 09:45, slejdops notifications@github.com wrote:

I’m fixing this now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/malariagen/datalab/issues/53#issuecomment-456736175, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QmZP-118aicGUzZfVQm9erEm-_9uks5vGC8vgaJpZM4aOY5a .

--

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health Big Data Institute Li Ka Shing Centre for Health Information and Discovery University of Oxford Old Road Campus Headington Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 or +44 (0)7866 541624 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: @alimanfoo https://twitter.com/alimanfoo

Please feel free to resend your email and/or contact me by other means if you need an urgent reply.

slejdops commented 5 years ago

Looks better now. `from dask_kubernetes import KubeCluster

cluster = KubeCluster(n_workers=10)

from dask.distributed import Client

client = Client(cluster)

client.get_versions(check=True)

{'scheduler': {'host': (('python', '3.6.0.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.14.65+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')), 'packages': {'required': (('dask', '1.0.0'), ('distributed', '1.25.1'), ('msgpack', '0.6.0'), ('cloudpickle', '0.6.1'), ('tornado', '5.1.1'), ('toolz', '0.9.0')), 'optional': (('numpy', '1.15.4'), ('pandas', '0.23.4'), ('bokeh', '1.0.2'), ('lz4', None), ('dask_ml', None), ('blosc', '1.6.2'))}}, 'workers': {}, 'client': {'host': [('python', '3.6.0.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.14.65+'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'en_US.UTF-8'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')], 'packages': {'required': [('dask', '1.0.0'), ('distributed', '1.25.1'), ('msgpack', '0.6.0'), ('cloudpickle', '0.6.1'), ('tornado', '5.1.1'), ('toolz', '0.9.0')], 'optional': [('numpy', '1.15.4'), ('pandas', '0.23.4'), ('bokeh', '1.0.2'), ('lz4', None), ('dask_ml', None), ('blosc', '1.6.2')]}}}`

alimanfoo commented 5 years ago

Looks good, thanks!