pangeo-data / pangeo-docker-images

Docker Images For Pangeo Jupyter Environment
https://pangeo-docker-images.readthedocs.io
MIT License
128 stars 92 forks source link

nbgitpuller and pangeo binder problems with recent images #206

Closed rabernat closed 1 year ago

rabernat commented 3 years ago

I am trying to debug a very obscure and annoying problem involving nbgitpuller and binder.pangeo.io.

Basically, if I make a binder using a recent pangeo-notebook and try to run it on binder.pangeo.io, nbgitpuller hangs

https://binder.pangeo.io/v2/gh/rabernat/pangeo-osn-demo/75509ae/?urlpath=git-pull?repo=https://github.com/rabernat/pangeo-osn-demo%26amp%3Bbranch=main

There is a javascript error

image

That image is using the same docker image base as as the current staging.us-central1-b.gcp

FROM pangeo/pangeo-notebook:2021.03.27
RUN echo $(which mamba)
RUN mamba install -n notebook -c conda-forge rise ipytree

Dockerfile

However, it does work on staging: https://staging.us-central1-b.gcp.pangeo.io/hub/user-redirect/git-pull?repo=https://github.com/rabernat/pangeo-osn-demo&branch=main

It also works on binder.pangeo.io if I roll back to an older image, e.g. pangeo/pangeo-notebook:2c94acd:

https://binder.pangeo.io/v2/gh/rabernat/pangeo-osn-demo/34e294b/?urlpath=git-pull?repo=https://github.com/rabernat/pangeo-osn-demo%26amp%3Bbranch=main

Dockerfile

So my best guess is that there is some specific incompatibility with binder.pangeo.io and some package in our recent images related to nbgitpuller, but I have no idea which packages are involved.

cc @yuvipanda and @choldgraf who helped me dig into this.

scottyhq commented 3 years ago

I don't know what the cause is, but one possible point of confusion for troubleshooting is that there are different jupyterhub versions/configurations currently between pangeo-binder and pangeo-cloud-federation hubs (where you're trying the 'staging' link). There is an issue somewhere about bringing them in sync, but I can't find it...

  1. pangeo hubs are using daskhub helm chart https://github.com/pangeo-data/pangeo-cloud-federation/pull/917/files

  2. pangeo binder does not use daskhub and manages jupyter and dask-gateway separately (https://github.com/pangeo-data/pangeo-binder/blob/staging/pangeo-binder/requirements.yaml)

  3. there are similar-sounding network issues in regards to the dask labextension working via binder but not on the hub: https://github.com/dask/dask-labextension/issues/166#issuecomment-813741166

  4. your customization of the docker image (RUN mamba install -n notebook -c conda-forge rise ipytree ) may result in different jupyter-related packages compared to the docker image you're comparing against on the hub.

  5. Here is a diff of the conda environments in your binder environment that does work versus the one that doesn't (diff -u works.txt doesnt.txt | grep jupyter), as you can see there are major changes:

    +jupyter-packaging         0.7.12             pyhd8ed1ab_0    conda-forge
    jupyter-panel-proxy       0.1.0                      py_0    conda-forge
    -jupyter-server-proxy      1.5.2              pyhd8ed1ab_0    conda-forge
    -jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
    -jupyter_core              4.7.0            py38h578d9bd_0    conda-forge
    -jupyter_server            1.2.1            py38h578d9bd_0    conda-forge
    +jupyter-resource-usage    0.5.1              pyhd8ed1ab_0    conda-forge
    +jupyter-server-proxy      1.6.0              pyhd8ed1ab_0    conda-forge
    +jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
    +jupyter_core              4.7.1            py38h578d9bd_0    conda-forge
    +jupyter_server            1.5.1            py38h578d9bd_0    conda-forge
    jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
    -jupyterhub-base           1.3.0            py38h578d9bd_0    conda-forge
    -jupyterhub-singleuser     1.3.0            py38h578d9bd_0    conda-forge
    -jupyterlab                2.2.9                      py_0    conda-forge
    +jupyterhub-base           1.3.0            py38h578d9bd_1    conda-forge
    +jupyterhub-singleuser     1.3.0            py38h578d9bd_1    conda-forge
    +jupyterlab                3.0.12             pyhd8ed1ab_0    conda-forge
    jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
    -jupyterlab_server         1.2.0                      py_0    conda-forge
    +jupyterlab_server         2.3.0              pyhd8ed1ab_0    conda-forge
    jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
damianavila commented 3 years ago

Porting over some conversations we had in 2i2c Slack (with edits) when I looked at this...

If I try to list the server extensions using the legacy server:

jovyan@jupyter-rabernat-2dpangeo-2dosn-2ddemo-2dl8t2uw55:~$ jupyter serverextension list
Config option `kernel_spec_manager_class` not recognized by `ListServerExtensionsApp`.
config dir: /srv/conda/envs/notebook/etc/jupyter
    dask_labextension  enabled
    - Validating...
      dask_labextension 5.0.1 OK
    jupyter_server_proxy  enabled
    - Validating...
      jupyter_server_proxy  OK
    jupyter_resource_usage  enabled
    - Validating...
      jupyter_resource_usage  OK
    jupyterlab  enabled
    - Validating...
/srv/conda/envs/notebook/lib/python3.8/site-packages/jupyter_server/transutils.py:13: FutureWarning: The alias `_()` will be deprecated. Use `_i18n()` instead.
  warnings.warn(warn_msg, FutureWarning)
      jupyterlab 3.0.12 OK
    nbgitpuller  enabled
    - Validating...
      nbgitpuller 0.9.0 OK
    panel  enabled
    - Validating...
      X is panel importable?
    xarray_leaflet  enabled
    - Validating...
      xarray_leaflet 0.1.13 OK

And this is the output with the new jupyter server:

jovyan@jupyter-rabernat-2dpangeo-2dosn-2ddemo-2dl8t2uw55:~$ jupyter server extension list
Config option `kernel_spec_manager_class` not recognized by `ListServerExtensionsApp`.
Config dir: /home/jovyan/.jupyter
/srv/conda/envs/notebook/lib/python3.8/site-packages/jupyter_server/transutils.py:13: FutureWarning: The alias `_()` will be deprecated. Use `_i18n()` instead.
  warnings.warn(warn_msg, FutureWarning)
Config dir: /srv/conda/envs/notebook/etc/jupyter
    dask_labextension enabled
    - Validating dask_labextension...
      dask_labextension 5.0.1 OK
    jupyter_server_proxy enabled
    - Validating jupyter_server_proxy...
      jupyter_server_proxy  OK
    jupyter_resource_usage enabled
    - Validating jupyter_resource_usage...
      jupyter_resource_usage  OK
    jupyterlab enabled
    - Validating jupyterlab...
      jupyterlab 3.0.12 OK
    nbclassic enabled
    - Validating nbclassic...
      nbclassic  OK
    xarray_leaflet enabled
    - Validating xarray_leaflet...
      xarray_leaflet 0.1.13 OK
Config dir: /usr/local/etc/jupyter

Notice the new jupyter server is not loading the nbgitpuller extension... Maybe nbgitpulller is not compatible with the new jupyter server and that incompatibility surfaced here? Thinking about your binder being somehow configured to start with the new server instead of the classical one and that triggers somehow the errors (just a guess for now).

Btw, is the staging pangeo binder defaulting to /tree whereas production one is defaulting to /lab, maybe?

AFAIK, if you start with jupyter notebook, that is calling the old server whereas if you start with jupyterlab, that is calling the new one. So you may end up in a situation where you started with lab (using the underlying new server) but then redirected to /tree to use the legacy notebook view. In that scenario, if you have a server extension that is not compatible with the new server, it will fail... I think... (edited)

Btw, I have not seen code in nbgitpuller to make it compatible with the new server so I think that might be the issue...

With the latest link reported in the top message, jupyter_server versions are different, so it kind of makes sense, IMHO, to see a different behavior... wondering about the different behavior on the staging one... that's intriguing...

weiji14 commented 1 year ago

Closing as Pangeo Binder is not running anymore, xref #459.