Closed jonwright closed 2 years ago
Hello Jon, thanks for trying the extension and for the feedback !
I wish I could figure out how to select x and y axes for a plot?
Well, h5web is a "dumb" viewer: it will only display visualizations corresponding to the content of the file. It is not meant to be a visualization tool.
The only way to select x and y axes for a plot would be to use a NXData group with an attribute axes
as the NeXus standard is supported by h5web.
This is due to a limitation in the Line visualisation: we have a feature (auto-scale off) where the axis limits are set to the limits of the full dataset. As a consequence, when using the Line, h5web fetches the full dataset. In this case, I believe this is around 256 GB (:scream:) making the whole Jupyter server crash. I still need to investigate the exact reason.
Note that the Heatmap suffers not from this limitation: it only fetches the slice. This is why the first display of /1.1/measurement/eiger
works. It is the switch to the 1D dataset /1.1/measurement/fpico6
that make h5web switch to the Line visualisation when coming back to /1.1/measurement/eiger
.
Is there a way to use hdf5 slice operations (maybe combined with fast histograms) so you only hold in memory what is going to be displayed on the screen (e.g. maximum data is a 2D image)?
It would indeed make sense to fetch only the slice even for a Line visualization. The Auto-scale feature puts a large limitation for large datasets and we need to work somehow around that.
We have an issue in h5web where we track our ideas and improvements to fetch large datasets: https://github.com/silx-kit/h5web/issues/616. The discussion about the auto-scale will surely continue there and any implementation fixing the crash will be mentioned there.
In the mean time, use the Heatmap ? :sweat_smile:
@jonwright thanks for the +ve feedback.
@loichuder thanks for the explanations. It seems like we are missing a tool to do flexible viewing of Nexus files i.e. selecting what to display against what. AM I right to say that users have to build their own tool with a mixture of h5py and matplotlib for now? Does bragy address this?
This is outside of the scope of Braggy, for sure. It's always possible to make a new GUI, but note that a solution to this problem is to generate a NeXus-compliant HDF5 file with external links to the relevant datasets, and then open this file in H5Web. Obviously not as practical as a GUI, but we could easily provide Python utilities to make generating this sort of file a breeze (perhaps these utilities already exist, even).
There is already some helpers to save NXData
: nexusformat or silx.io.nxdata.save_NXdata.
Otherwise since this runs in a notebook, using matplotlib
or any other plot library is probably best suited for tailored plots if not saved as NXData
.
BTW, in silx view
, there is a feature to create "virtual NXData" by dragging and dropping datasets as signal and axes, but to me it is a bit complex since one needs to know about NeXus to use it.
Following on the crash issue, we have something in the works to solve it: https://github.com/silx-kit/h5web/issues/616#issuecomment-982734122
I will close this once this is shipped in a jupyterlab-h5web release.
https://github.com/silx-kit/h5web/issues/616#issuecomment-982734122 was integrated in v0.1.0 that is now deployed in jupyter-slurm.
I am assuming this is the project behind the wonderful thing I found yesterday that lets me browse hdf5 files in jupyterlab? It looks fantastic. I wish I could figure out how to select x and y axes for a plot? I always see data versus point number. The rest of the message is a bug report for how I seem to have broke something already (sorry!) :
Describe the bug
jupyterlab crashes when reading large dataset, perhaps an out of memory error?
To Reproduce
1 - Log into jupyter-slurm.esrf.fr with one single core and the lab interface 2 - Navigate to open : /data/id11/nanoscope/blc12407/id11/CeO2_38keV/CeO2_38keV_CeO2_rotation/CeO2_38keV_CeO2_rotation.h5 3 - open dataset /1.1/measurement/eiger : it displays 4 - open dataset /1.1/measurement/fpico6 : it displays 5 - go back to /1.1/measurement/eiger : jupyterlab stops running 6 - all the other tabs and kernels appear to exit when jupyterlab fails
Expected behaviour
In the worst case, a plugin would crash without taking down all of the other kernels. Ideally it would not crash.
Is there a way to use hdf5 slice operations (maybe combined with fast histograms) so you only hold in memory what is going to be displayed on the screen (e.g. maximum data is a 2D image)? Then libhdf5 should manage the memory cache in some sensible way.
Context
Extension lists
This is based on a bit of guesswork as to what is actually running when I use jupyter-slurm :