marbl-ecosys / HiRes-CESM-analysis

Notebooks and tools for validating the 0.1 degree POP / CICE run with ocean BGC
http://hires-cesm-analysis.dokku.projectpythia.org/Interactive_Dashboard
5 stars 7 forks source link

cannot use dask dashboard with hires-marbl environment #24

Closed klindsay28 closed 4 years ago

klindsay28 commented 4 years ago

I'm adding dask via an ncar-jobqueue cluster in PR #21. I'd like to visualize the worker's activity, so that I can tell if what I'm adding is helpful to computational performance of the diagnostics. However, when I attempt to use the dask dashboard, I get the following error

HTTPServerRequest(protocol='http', host='jupyterhub.ucar.edu', method='GET', uri='/individual-workers/ws', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/glade/work/klindsay/miniconda3/envs/hires-marbl/lib/python3.7/site-packages/tornado/websocket.py", line 956, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
  File "/glade/work/klindsay/miniconda3/envs/hires-marbl/lib/python3.7/site-packages/tornado/web.py", line 3178, in wrapper
    return method(self, *args, **kwargs)
  File "/glade/work/klindsay/miniconda3/envs/hires-marbl/lib/python3.7/site-packages/bokeh/server/views/ws.py", line 125, in open
    raise ProtocolError("Subprotocol header is not 'bokeh'")
bokeh.protocol.exceptions.ProtocolError: Subprotocol header is not 'bokeh'

I've found similar messages in github issues, like https://github.com/jupyterhub/jupyter-server-proxy/issues/179, but this is purported to be fixed.

I'm inferring that the conda environment has pulled together versions of packages that don't play well together. This portion of the software stack is a mystery to me, so I don't see how to proceed towards fixing this.

matt-long commented 4 years ago

@andersy005, can you please help!

dcherian commented 4 years ago

What's the bokeh version? it should not be 2.0.0. 2.1.1 is working fine for me right now.

klindsay28 commented 4 years ago

In my environment, bokeh version = 2.2.0.

andersy005 commented 4 years ago

In my environment, bokeh version = 2.2.0.

What versiosn of dask, distributed and dask-jobqueue are you running?

klindsay28 commented 4 years ago

output from conda list includes

dask                      2.24.0                     py_0    conda-forge
dask-jobqueue             0.7.1                      py_0    conda-forge
distributed               2.24.0           py37hc8dfbb8_0    conda-forge
andersy005 commented 4 years ago

The versions look okay... Try running this code snippet and let me know what you get

import subprocess
dashboard_port = str(client.scheduler_info()['services']['dashboard']) 

p = subprocess.Popen('ss -nlput'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE) 
stdout = p.communicate()[0].decode() 
[each for each in stdout.splitlines() if dashboard_port in each] 
klindsay28 commented 4 years ago
['tcp    LISTEN     0      128       *:8787                  *:*                   users:(("python",pid=81685,fd=47))',
 'tcp    LISTEN     0      128    [::]:8787               [::]:*                   users:(("python",pid=81685,fd=51))']
andersy005 commented 4 years ago

What's the bokeh version? it should not be 2.0.0. 2.1.1 is working fine for me right now.

@dcherian, are you using the JupyterHub? I am asking because I am using the latest versions of dask/distributed/dask-jobqueue/bokeh and everything works fine when I am not using the JupyterHub, but the moment I run the notebook on the Hub, I get the same error @klindsay28 is getting.

dcherian commented 4 years ago

ah no. I don't use the hub

andersy005 commented 4 years ago

@klindsay28, my speculation is that the Hub is the culprit here. The issue stems from the fact there's version mismatch between your environment (hires-marbl) and the environment the Jupyter Server is coming from (the jupyter server is owned by the Hub).

The long term solution is to have the JupyterHub environment upgraded/updated (cc @jbaksta). The short term solution is to pin the dask and distributed versions to 2.14 and bokeh to 1.4.

klindsay28 commented 4 years ago

I'm getting the same error outside of jupyterhub

andersy005 commented 4 years ago

I'm getting the same error outside of jupyterhub

Is the jupyter lab running from your base environment or hires-marbl environment ?

klindsay28 commented 4 years ago

I'm a bit confused, I'm not sure if something is going through jupyterhub. I'm using a modified version of jlab-dav to run jupyterlab in a SLURM job on casper and am using ssh port forwarding to view jupyterlab through localhost in my browser. However, after instantiating the cluster, the value of cluster.dashboard_link is https://jupyterhub.ucar.edu/dav/user/klindsay/proxy/43104/status. The presence of jupyterhub in this makes me think I'm somehow going through jupyterhub. But I don't understand why that would be.

klindsay28 commented 4 years ago

I had not activated any conda environment. I'm going to try again after activating base.

andersy005 commented 4 years ago

However, after instantiating the cluster, the value of cluster.dashboard_link is https://jupyterhub.ucar.edu/dav/user/klindsay/proxy/43104/status. The presence of jupyterhub in this makes me think I'm somehow going through jupyterhub. But I don't understand why that would be.

Ooooh... This is ncar-jobqueue issue. Under the hood, ncar-jobqueue tries to determine which machine you are running on and it sets the dashboard url accordingly (assuming you are running on the JupyterHub). When you are not using the JupyterHub, you need to modify the dashboard link

import dask
cluster = ncar_jobqueue.NCARCluster()
dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'})
client = Client(cluster)
client
andersy005 commented 4 years ago

I had not activated any conda environment. I'm going to try again after activating base.

I recommend activating the hires-marbl environment, and launching jupyter lab.

klindsay28 commented 4 years ago

I still get the error when I run jlab-dav after activating the hires-marbl environment (also after activating the base environment).

klindsay28 commented 4 years ago

I applied PR #25, including running environments/postBuild, and I still get the error. FYI, it updated dask-distributed to 2.25.0.

andersy005 commented 4 years ago

So strange.... Can you point me to the location (on GLADE) of the notebook you are running?

klindsay28 commented 4 years ago

/glade/work/klindsay/analysis/HiRes-CESM-analysis/notebooks/Untitled2.ipynb

andersy005 commented 4 years ago

/glade/work/klindsay/analysis/HiRes-CESM-analysis/notebooks/Untitled2.ipynb

Thanks!

I just launched jupyter lab using your environment

$ conda activate /glade/work/klindsay/miniconda3/envs/hires-marbl
(hires-marbl)
abanihi at casper26 in ~
$ which python
/glade/work/klindsay/miniconda3/envs/hires-marbl/bin/python
(hires-marbl)
abanihi at casper26 in ~
$ jlab-casper
ssh -N -L 8777:casper26:8777 abanihi@casper26.ucar.edu
[I 16:16:45.392 LabApp] [jupyter_nbextensions_configurator] enabled 0.4.1
[I 16:16:45.961 LabApp] JupyterLab extension loaded from /glade/work/klindsay/miniconda3/envs/hires-marbl/lib/python3.7/site-packages/jupyterlab
[I 16:16:45.961 LabApp] JupyterLab application directory is /glade/work/klindsay/miniconda3/envs/hires-marbl/share/jupyter/lab
[I 16:16:45.965 LabApp] Serving notebooks from local directory: /glade/u/home/abanihi
[I 16:16:45.965 LabApp] Jupyter Notebook 6.1.3 is running at:
[I 16:16:45.965 LabApp] http://casper26:8777/

and everything seems to be working fine on my end:

Screen Shot 2020-09-01 at 4 17 56 PM

I still get the error when I run jlab-dav after activating the hires-marbl environment (also after activating the base environment).

What's the content of jlad-dav script?

klindsay28 commented 4 years ago

jlab-dav is /glade/u/home/klindsay/bin/jlab-dav

Please note that I get the error message after clicking on an element of the dashboard, such as showing workers or graph.

andersy005 commented 4 years ago

Please note that I get the error message after clicking on an element of the dashboard, such as showing workers or graph.

I can confirm that the dashboard works when I open the widgets or click on an element:

Screen Shot 2020-09-01 at 4 38 21 PM

It appears that the jupyter lab is being launched from base in the jlab-dav script regardless of the environment has activated:

# 4. open browser: http://localhost:8888

conda activate base

So, I recommend updating your base environment (it has somewhat outdated packages which could be the culprit)

$ conda activate base
$ conda update --all -c conda-forge

and re-running jlab-dav.

matt-long commented 4 years ago

I have a working dashboard via JHub with

dask                      2.3.0           
distributed               2.3.2 
bokeh                     1.4.0 

@andersy005, do you recommend pinning any of these versions? It's is critical that we resolve this ASAP to get the codes running again. We could revisit later with more comprehensive testing.

andersy005 commented 4 years ago

@andersy005, do you recommend pinning any of these versions? It's is critical that we resolve this ASAP to get the codes running again. We could revisit later with more comprehensive testing.

Yeah... Let's pin the versions for the time being. If the user is not using the Hub, the user should launch the jupyter lab from the hires-marbl environment instead of base otherwise the version pinning is likely going to break due to version mismatches of some packages in base and hires-marbl.

I will pin the versions in #25

klindsay28 commented 4 years ago

@matt-long, note that codes do run, I just can't visualize how dask is operating.

So there are 2 environments at play, base and hires-marbl. Matt previously advised me to launch jupyterlab from a pared down base environment, and select the more complete environment for the notebook. I don't recall why he advised to do that, or if that advice still holds.

jbaksta commented 4 years ago

How critical is it to update the jupyterhub instance at this point? I plan on doing it in the near future, but I'll probably attempt to coordinate with a systems outage.

klindsay28 commented 4 years ago

fixed by #25