silx-kit / jupyterlab-h5web

A JupyterLab extension to explore and visualize HDF5 file contents. Based on https://github.com/silx-kit/h5web.
MIT License
65 stars 8 forks source link

[jupyterlab] [esrf] [jupyter-slurm] very slow reading and broken `.h5`? #51

Closed jpcbertoldo closed 2 years ago

jpcbertoldo commented 3 years ago

Hi, I was using jupyterlab-h5web on jupyter-slurm.esrf.fr, and some files are veeeery slow, and I have one that doesn't even open. I am not sure if this is somehow expected/normal, but I recorded a few of this issues so you can check it.

See: https://youtu.be/qSM2JsVVEHY

I open 4 files in the video, respectively:

1) 0:08: an .h5 from the acquisition system at id11 path: /data/id11/3dxrd/blc12852/id11/bmg_l1/bmg_l1_bmg_dct2/bmg_l1_bmg_dct2.h5 issue: 0:38: displaying one frame is quite slow

2) 1:15: a "small" .h5 written by my code is quite slow path: /data/id11/3dxrd/blc12852/id11/analysis/bmg_l1_bmg_dct2/difspots/000000.h5 issue: isn't it slow relative to the size of the things inside?

3) 1:33: an .h5 regrouping (external links) about 3300 .h5s like the one above path: /data/id11/3dxrd/blc12852/id11/analysis/bmg_l1_bmg_dct2/difspots.h5 issue: slow to open, then very slow to expand the group with the external links

4) 2:30: an .h5 with only 2 datasets, about the same size of the first one path: /data/id11/3dxrd/blc12852/id11/bmg_l1/bmg_l1_bmg_dct2/bmg_l1_bmg_dct2.preprocessed.h5 issue: veeeeeeeery slow; I kept waiting after the end of the video for several minutes, the entire jupyterlab becomes very slow, and it ends up breaking the page (screenshot below). When I reload the page, the h5web tab (in jupyterlab) is not there anymore.

Screenshot from 2021-08-26 14-31-13

A few notes (Idk if this might have some influence):

t20100 commented 3 years ago

There is an issue for the integration of h5web in JLab file browser which needs to be solved in jupyterlab (see #24): There is no way to avoid loading the hdf5 file in the browser!.... so next version of JLab v3.1.x will be needed (jupyter-slurm runs JLab v2).

There is no such issue when opening a file from within a notebook (within JLab):

from jupyterlab_h5web import H5Web
H5Web('<path to the HDF5 file>')

https://github.com/silx-kit/jupyterlab-h5web#in-jupyter-notebooks This should avoid at least the time for the first loading of a file.

jpcbertoldo commented 3 years ago

Ok, thanks!

Just to be sure: to use it in my own jupyterlab installation, so I need to install jupyterlab-h5web in the jupyterlab's environment or on the kernel's environment (I use different kernels)? Or both?

Last but not least, is JLab v2 --> JLab v3.1.X upgrade supposed to happen anytime soon? I'm just wondering so I can organize my own priorities.

t20100 commented 3 years ago

to use it in my own jupyterlab installation, so I need to install jupyterlab-h5web in the jupyterlab's environment or on the kernel's environment (I use different kernels)? Or both?

Both: kernel side for the jupyterlab_h5web import in the notebook, jupyterlab side for enabling the feature in JLab. On JLab side, you might also need to rebuild JLab (see https://github.com/silx-kit/jupyterlab-h5web#install)

is JLab v2 --> JLab v3.1.X upgrade supposed to happen anytime soon?

Not soon, to upgrade it requires quite a bit of change in the jupyter-slurm installation first (to workaround incompatibilty of matplotlib version installed on the system and JLab 3 compatible extension), and it's not my priority for now.

loichuder commented 3 years ago

Hello @joaopcbertoldo and thanks for the feedback ! To add on @t20100 comments:

loichuder commented 3 years ago

As already mentioned, the root issue is that opening an h5web tab by double-clicking on a file fetched the whole file as a base64 blob. Not only this is quite inefficient but also pending requests of this type will slow down the whole "regular" fetching process.

I thought we needed to wait for a newer version of JLab to remove this blob fetching but I was able to bypass the problematic machinery in 0.0.10. Therefore, 0.0.10 removed this blob fetching.

Getting back to the issue at hand: 0.0.10 is now deployed on jupyter-slurm and I tried all the operations above. Except for the external links one (3.), I found that operations were way faster than what you presented so I would invite you to retry and tell us if you get the same impression.

loichuder commented 2 years ago

Latest versions (1.1.0 and 0.0.14) fix 3. and I already mentioned that other points are fixed for versions above 0.0.10.

Therefore I will now close this issue. We can reopen if needed.