rapidsai / jupyterlab-nvdashboard

A JupyterLab extension for displaying dashboards of GPU usage.
BSD 3-Clause "New" or "Revised" License
588 stars 77 forks source link

blank content #24

Open ggosiang opened 4 years ago

ggosiang commented 4 years ago

I've followed installation steps below to install into my JL,

pip install jupyterlab-nvdashboard
jupyter labextension install jupyterlab-nvdashboard

but didn't get any content when running code in CPU/GPU, need help

jupyterlab             1.0.4    
jupyterlab-nvdashboard 0.1.11      
bokeh                  1.0.4

img1

img2

jacobtomlinson commented 3 years ago

Thanks for sharing, this has been really helpful. I built a container pretty similar to the one you shared (but I of course can't use my-centos7-python3.8.9-base or copy in extensions/tracker.jupyterlab-settings) and things are working for me.

FROM centos/python-38-centos7:latest

RUN pip3 install --no-cache-dir --upgrade pip==20.2.4 && pip3 --no-cache-dir install wheel

RUN pip3 install --no-cache-dir \
        notebook \
        ipywidgets \
        tornado \
        make \
        bash_kernel \
        pypki2 \
        nbdime \
        isort \
        black \
        jupyterlab_code_formatter \
        jupyterlab_execute_time \
        jupyterlab-nvdashboard==0.6.0 \
        jupyterhub \
        jupyterlab==3.0.7

### Activate jupyterlab extensions ###
RUN nbdime extensions --enable \
    && jupyter labextension install --clean @ryantam626/jupyterlab_code_formatter \
    && jupyter serverextension enable --py jupyterlab_code_formatter --sys-prefix

RUN pip3 install --no-cache-dir jupyterlab-snippets \
    && jupyter lab build && npm update

# set startpoints
EXPOSE 8888
CMD ["start-notebook.sh"]
$ docker run --rm -it -p 8888:8888 --gpus all jtomlinson/nvdash24:latest jupyter lab --ip 0.0.0.0

image

I did notice that by accidentally leaving out the --gpus all flag resulted in a blank panel though.

image

Looking at the Jupyter logs I see this error repeated.

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/server.py", line 7, in <module>
    from jupyterlab_nvdashboard import apps
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/apps/__init__.py", line 2, in <module>
    from . import gpu
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 17, in <module>
    pynvml.nvmlInit()
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1371, in nvmlInit
    nvmlInitWithFlags(0)
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1354, in nvmlInitWithFlags
    _LoadNvmlLibrary()
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1401, in _LoadNvmlLibrary
    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 743, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1399, in _LoadNvmlLibrary
    nvmlLib = CDLL("libnvidia-ml.so.1")
  File "/opt/rh/rh-python38/root/usr/lib64/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/server.py", line 7, in <module>
    from jupyterlab_nvdashboard import apps
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/apps/__init__.py", line 2, in <module>
    from . import gpu
  File "/opt/app-root/lib64/python3.8/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 17, in <module>
    pynvml.nvmlInit()
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1371, in nvmlInit
    nvmlInitWithFlags(0)
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1354, in nvmlInitWithFlags
    _LoadNvmlLibrary()
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 1401, in _LoadNvmlLibrary
    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
  File "/opt/app-root/lib64/python3.8/site-packages/pynvml/nvml.py", line 743, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

This makes sense because I am not providing any GPUs to the container and therefore the GPU driver is not being loaded by the Docker runtime.

Atharex commented 3 years ago

Only the tab NVLink throughput shows up a 500: Internal server error message (expected as I don't have a NVLink)

jl2

In the jupyterlab logs I see this:

GPU 0: GeForce RTX 2080 Ti (UUID: GPU-4bdd990e-c568-7aa1-7141-6d13dc0a6961)
Warning: Setting counter control is deprecated!

ERROR:tornado.application:Uncaught exception GET /NVLink-Throughput (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.my.domain.net', method='GET', uri='/NVLink-Throughput', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/usr/local/lib/python3.8/site-packages/bokeh/server/views/doc_handler.py", line 52, in get
    session = await self.get_session()
  File "/usr/local/lib/python3.8/site-packages/bokeh/server/views/session_handler.py", line 120, in get_session
    session = await self.application_context.create_session_if_needed(session_id, self.request, token)
  File "/usr/local/lib/python3.8/site-packages/bokeh/server/contexts.py", line 218, in create_session_if_needed
    self._application.initialize_document(doc)
  File "/usr/local/lib/python3.8/site-packages/bokeh/application/application.py", line 171, in initialize_document
    h.modify_document(doc)
  File "/usr/local/lib/python3.8/site-packages/bokeh/application/handlers/function.py", line 132, in modify_document
    self._func(doc)
  File "/usr/local/lib/python3.8/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 224, in nvlink
    nvlink_state["tx"] = [
  File "/usr/local/lib/python3.8/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 226, in <listcomp>
    [
  File "/usr/local/lib/python3.8/site-packages/jupyterlab_nvdashboard/apps/gpu.py", line 227, in <listcomp>
    pynvml.nvmlDeviceGetNvLinkUtilizationCounter(
  File "/usr/local/lib/python3.8/site-packages/pynvml/nvml.py", line 2590, in nvmlDeviceGetNvLinkUtilizationCounter
    _nvmlCheckReturn(ret)
  File "/usr/local/lib/python3.8/site-packages/pynvml/nvml.py", line 743, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.nvml.NVMLError_NotSupported: Not Supported
ERROR:tornado.access:500 GET /NVLink-Throughput (127.0.0.1) 44.80ms
[E 2021-06-28 15:48:59.973 SingleUserNotebookApp log:181] {
      "X-Real-Ip": "xx.xx.xx.xx",
      "X-Forwarded-Server": "traefik-564b75665b-xntdk",
      "X-Forwarded-Proto": "https,http",
      "X-Forwarded-Port": "443,80",
      "X-Forwarded-Host": "jupyterhub.my.domain.net",
      "X-Forwarded-For": "xx.xx.xx.xx",
      "Upgrade-Insecure-Requests": "1",
      "Sec-Gpc": "1",
      "Dnt": "1",
      "Cookie": "jupyterhub-user-test=[secret]; _xsrf=[secret]; jupyterhub-session-id=[secret]; CF_Authorization=[secret]",
      "Cf-Visitor": "{\"scheme\":\"https\"}",
      "Cf-Request-Id": "xxxxxxxxxxx",
      "Cf-Ray": "xxxxxxxxxx",
      "Cf-Ipcountry": "XX",
      "Cf-Connecting-Ip": "xx.xx.xx.xx",
      "Cf-Access-Jwt-Assertion": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "Cf-Access-Authenticated-User-Email": "atharex@my.domain.net",
      "Cdn-Loop": "cloudflare",
      "Accept-Language": "en-US,en;q=0.5",
      "Accept-Encoding": "gzip",
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
      "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
      "Host": "jupyterhub.my.domain.net",
      "Connection": "close"
    }
[E 2021-06-28 15:48:59.973 SingleUserNotebookApp log:189] 500 GET /user/test/nvdashboard/NVLink-Throughput 48.58ms

Other entries do not show errors in the jupyterlab logs:

[I 2021-06-28 16:02:21.152 SingleUserNotebookApp log:189] 204 PUT /user/test/lab/api/workspaces/default?1624896141010  2.52ms
[I 2021-06-28 16:02:21.737 SingleUserNotebookApp log:189] 200 GET /user/test/nvdashboard/GPU-Utilization  33.25ms
[I 2021-06-28 16:02:22.268 SingleUserNotebookApp log:189] 204 PUT /user/test/lab/api/workspaces/default?1624896142049  2.43ms
[I 2021-06-28 16:02:22.287 SingleUserNotebookApp log:189] 200 GET /user/test/nvdashboard/GPU-Memory 31.16ms
[I 2021-06-28 16:02:22.715 SingleUserNotebookApp log:189] 204 PUT /user/test/lab/api/workspaces/default?1624896142604  2.61ms
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
[I 2021-06-28 16:02:23.081 SingleUserNotebookApp log:189] 200 GET /user/test/nvdashboard/GPU-Resources 67.91ms

And show up as empty screens.

jl1

jacobtomlinson commented 3 years ago

@Atharex the issue you describe is different to the problem being discussed here. This issue covers blank pages in the GPU dashboards toolbar on the left. It sounds like you are seeing the buttons populated but the plots themselves are blank.

If you create the container as I described and run it with Docker as I showed do you still see the same problem?

Atharex commented 3 years ago

Yes @jacobtomlinson, my installation shows the buttons, but the plots are blank. Should I create a new issue for this?

The problem persists both with docker and my setup with ztjh on kubernetes. The GPUs are provisioned to the pods with the GPU operator

Atharex commented 3 years ago

Might this also be an issue? I see a strange 431 response in the network analyzer, whenever I select a tab.

jl-3

jacobtomlinson commented 3 years ago

@Atharex are you using any kind of proxy? I know for some Jupyter Hub deployments that run behind Nginx you need to increase the request size limit for saving large notebooks. Although I've not seen the same for dashboards like this.

Atharex commented 3 years ago

Yes I am using a Traefik v1.7.24 ingress controller for access to Jupyterhub.

In the ztjh helm chart I have this config (with overblown numbers for request and response bodies - but it has no effect):

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/buffering:
      maxrequestbodybytes: 204857600
      memrequestbodybytes: 209715300
      maxresponsebodybytes: 204857610
      memresponsebodybytes: 209715200
  hosts:
    - jupyterhub.my.domain.net
  pathSuffix: ''
  tls:
   - hosts:
     - jupyterhub.my-domain.net
     secretName: my-ssl-certificate
jacobtomlinson commented 3 years ago

Thanks for sharing. Do you experience the same issues if you run the container locally on your machine?

My guess here is that the ingress is causing the problem here but not correctly proxying the websocket connection.

taylorsmithgg commented 3 years ago

Same issue on kubeflow 1.2

Installed via: pip install jupyterlab_nvdashboard

Failed to load resource: the server responded with a status of 404 (Not Found)Failed to load resource: the server responded with a status of 404 (Not Found)

Trying to reach:

https://mycompany.com/notebook/kubeflow-services/thunderdome/nvdashboard/index.json?1627804823841
(base) jovyan@thunderdome-0:~$ jupyter server extension list
Config dir: /home/jovyan/.jupyter

/opt/conda/lib/python3.8/site-packages/jupyter_server/transutils.py:13: FutureWarning: The alias `_()` will be deprecated. Use `_i18n()` instead.
  warnings.warn(warn_msg, FutureWarning)
Config dir: /opt/conda/etc/jupyter
    jupyter_server_proxy enabled
    - Validating jupyter_server_proxy...
      jupyter_server_proxy  OK
    jupyter_server_mathjax enabled
    - Validating jupyter_server_mathjax...
      jupyter_server_mathjax  OK
    jupyterlab enabled
    - Validating jupyterlab...
      jupyterlab 3.0.16 OK
    jupyterlab_git enabled
    - Validating jupyterlab_git...
      jupyterlab_git 0.30.1 OK
    jupyterlab_nvdashboard enabled
    - Validating jupyterlab_nvdashboard...
       X validation failed
    nbclassic enabled
    - Validating nbclassic...
      nbclassic  OK
    nbdime enabled
    - Validating nbdime...
      nbdime 3.1.0 OK

Config dir: /usr/local/etc/jupyter

(base) jovyan@thunderdome-0:~$ jupyter serverextension list
config dir: /opt/conda/etc/jupyter
    jupyter_server_proxy  enabled 
    - Validating...
      jupyter_server_proxy  OK
    jupyterlab  enabled 
    - Validating...
      jupyterlab 3.0.16 OK
    jupyterlab_git  enabled 
    - Validating...
      jupyterlab_git 0.30.1 OK
    nbdime  enabled 
    - Validating...
      nbdime 3.1.0 OK
jacobtomlinson commented 3 years ago

Thanks @taylorsmithgg are you able to share the logs from the Jupyter container?

taylorsmithgg commented 3 years ago

This appears to be the only relevant log:

19:14.355 ServerApp] 404 GET /notebook/kubeflow-services/thunderdome/nvdashboard/index.json?1627885153887 (127.0.0.1) 1.29ms referer=http://kubeflow.vyrl.co/notebook/kubeflow-services/thunderdome/lab/workspaces/auto-K/tree/Sequence%20Training.ipynb
jacobtomlinson commented 3 years ago

Thanks. Looking at your list it seems like there is a problem with your jupyterlab_nvdashboard install.

Are you able to import jupyterlab_nvdashboard within Python?

taylorsmithgg commented 3 years ago

Installed via pip install jupyterlab_nvdashboard on fresh notebook instance. Receiving no output, but import appears to be working. Still blank GPU DASHBOARDS tab. image image

rsdmse commented 3 years ago

I am having the same problem in a conda environment with pip install jupyterlab_nvdashboard==0.6.0. I can import it and see "GPU Utilization" etc in the side bar, but nothing shows up. I found something in the log that might be useful:

[W 2021-08-30 15:15:35.855 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: module 'jupyterlab_nvdashboard' has no attribute 'load_jupyter_server_extension'

Just like others it couldn't be validated:

    jupyterlab_nvdashboard  enabled 
    - Validating...
      X is jupyterlab_nvdashboard importable?
taylorsmithgg commented 2 years ago

Any updates? Happy to help where possible.

jryanearl commented 2 years ago

Hi, I am experiencing the same issue on a freshly pulled RAPIDSAI Docker container from nVidia NGC.

[W 2022-04-12 16:58:21.567 ServerApp] jupyterlab_nvdashboard | extension failed loading with message: module 'jupyterlab_nvdashboard' has no attribute 'load_jupyter_server_extension'

Start JupyterLab with this docker image to reproduce: rapidsai/rapidsai-core-dev:22.02-cuda11.5-devel-ubuntu20.04-py3.9

jacobtomlinson commented 2 years ago

@jryanearl I'm afraid I am still unable to reproduce with the docker image you mentioned.

$ docker run --rm -it --gpus=all -p 8888:8888 rapidsai/rapidsai-core-dev:22.02-cuda11.5-devel-ubuntu20.04-py3.9
image

Perhaps you could share the output of nvidia-smi from within the container?