silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
165 stars 17 forks source link

Support numeric value for X Dimension without Nexus attributes #1617

Closed Blackclaws closed 2 weeks ago

Blackclaws commented 2 months ago

Is your feature request related to a problem?

Right now its not well document how H5Web works with Nexus or rather there isn't even any mention of Nexus in the main Readme (maybe I'm just missing mention of that)?

As someone that comes from a background where I have never worked with the Nexus annotations I assumed that H5Web would take a dataset in the HDF5 file and allow me to plot X vs Y. However the value of the X dimension in the dataset is completely ignored and replaced by an integer counting the row number instead.

Requested solution or feature

Plot a line through the actual X/Y values as they are stored numerically.

Additional context

The fact that right now the X value is completely ignored makes scatter plots impossible by default. Would be nice to have.

axelboc commented 2 months ago

You're totally right, documentation on H5Web's NeXus capabilities is very much lacking. Thanks for reporting your interest for it.

For the time being, your best option is to look at the demo site: https://h5web.panosc.eu/. You can inspect some of the sample files like water_224.h5, which include typical NXdata groups that H5Web is able to visualize. Or better, you can look at the mock demo, which basically demonstrates all the NeXus features that are currently supported.

Blackclaws commented 2 months ago

First thanks for even creating this project. Its very useful as is, I just feel that especially for newer users a bit more documentation and some things working out of the box would work wonders.

I'm currently trying to standardize our research groups on HDF5 as the default interchange format and a web based viewer for looking at results is a great addition to have there :)

If I understand correctly there currently is no way to plot two columns of the same dataset against each other as one might be used to do from tools like gnuplot (where you'd usually have a csv or similar as input).

The main reason I'm asking is because a lot of the time people will simply try to plug their existing datasets which are usually in csv format in as a dataset. But I guess that NeXus format is doing things differently here? I have to say reading their documentation it isn't quite clear if for example the "axes" attribute could also point at a column of the signal.

I might have to also mention that we have nothing at all to do with Neutron or XRay beams but instead do a lot of optical and electrical tests :)

loichuder commented 2 months ago

The main reason I'm asking is because a lot of the time people will simply try to plug their existing datasets which are usually in csv format in as a dataset. But I guess that NeXus format is doing things differently here?

Yes, indeed. If we consider a CSV file with several columns, the NeXus way of structuring it would be to have each column in its own dataset.

This is why NeXus expects signal (the actual data) and axes (the data to plot against) to be in different datasets. Again, the NeXus documentation lacks a bit of clarity but @axelboc gave a nice overview of what there is to know in his comment on your last issue.

The NeXus way of plotting a dataset against another dataset is so far the only one we support. If you really want to have your data in a single dataset, I guess you could create an axes virtual dataset that would point to a slice of the original dataset but this is quite convoluted.

Blackclaws commented 2 months ago

By now I've managed to successfully create a couple of NeXus compliant groups that automatically plot fine, the way the data is structured takes some getting used to and isn't ideally suited to how we acquire data but it does work.

I'm wondering whether you'd be open to an additional set of attributes/datasets that differ from the NeXus spec that might make plotting less structured data easier for newcomers. Given of course that you wouldn't have to implement that yourselves but instead have it as an "we're taking pull requests" kind of thing.

I don't want to suggest that you have to accept any and all bloat of your codebase just wondering whether this is something you'd be open to if we do intend to diverge from what NeXus is doing internally and add the functionality to H5Web.

axelboc commented 2 months ago

We're definitely open to supporting other HDF5 data formats than NeXus. For instance, we have very basic support for NetCDF4's _FillValue attribute and we're hoping to eventually support HDF5 dimension scales, which are quite heavily used in the NetCDF4 spec.

What we don't want to do, though, is invent a new standard. That's why I reckon you should start by finding an HDF5 data format that is more suited to the way your data is structured, and then we can discuss adding support for that format in H5Web. Does that make sense?

Blackclaws commented 2 months ago

Yeah I get that we don't want a https://xkcd.com/927/ situation.

Makes perfect sense to try and find an existing data format. So far I didn't have a lot of luck though as most formats seem to relate to Beamlines, Geographic data or Molecular dynamics.

I wonder though whether an alternative approach would be to just add plotting hints (not really a new standard) but just some information that h5web can pick up on and use to plot data.

If you have any idea what standards other than NeXus are out there that might be interesting please do share.

loichuder commented 2 months ago

There are a lot of standards out there. Even HDF5 features could be used some "plotting hints" as we did for the RGB and as @axelboc mentionned (dimension scales).

To be able to give an informed answer, I would be very interested to know more about your usecase:

Sending us an example file would be grand. You can also send us an email to our feedback address if you are not willing to share all this on GitHub.

We are always keen on gathering usecases outside the Neutron/X-Ray community to make H5Web the most useful possible.

axelboc commented 2 months ago

If you have any idea what standards other than NeXus are out there that might be interesting please do share.

NetCDF, CXI, Loom, Neurodata Without Borders (NWB), XDMF (eXtensible Data Model and Format), Scalable Checkpoint/Restart (SCR), Hierarchical Data Format for Earth Observing System (HDF-EOS), NIX (Neuroscience Information Exchange), LAMMPS Dump File Format, ADIOS (Adaptable Input Output System), CFD General Notation System (CGNS), H5MD (HDF5 for Molecular Dynamics)

axelboc commented 2 weeks ago

I'm closing this now, but feel free to keep the discussion going, or to open a new issue if you end up finding an HDF5 file format that fits your need better than NeXus.