silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
165 stars 17 forks source link

Allow image display for non-opaque dataset #1623

Closed Blackclaws closed 2 months ago

Blackclaws commented 2 months ago

Is your feature request related to a problem?

Right now jpg/png images are only supported for the Raw Visualization which isn't available if any other visualization is supported by the data. However opaque datasets are not that easy to create with all software that is able to write hdf5.

Requested solution or feature

Implement a CLASS or similar that would allow the RAW visualization for any dataset type. Alternatively roll the image detection logic into the IMAGE class.

axelboc commented 2 months ago

Hi @Blackclaws,

Again, we should not define our own specification but try to make use of the existing. JPEG/PNG images have no built-in support in HDF5, so the opaque dtype is the most appropriate since it is meant to store arbitrary binary data.

The alternative is to use a tool like HDF View or some code to convert your JPEG/PNG images into HDF5 RGB images (i.e. datasets with CLASS="IMAGE" and SUBCLASS="IMAGE_TRUECOLOR"), which H5Web already supports (cf. /nexus_entry/rgb-image in the mock demo).

Apollo3zehn commented 2 months ago

I have added opaque support (not yet released) but I am not sure yet if I did it right. Is it expected that the colors in H5web are different from the original?

grafik

Apollo3zehn commented 2 months ago

Oh sorry, I posted this to the wrong issue. But the question fits here, too.

Apollo3zehn commented 2 months ago

Ah, it it seems to be a dark mode thing. Sorry for posting to the wrong issue.

Blackclaws commented 2 months ago

Hi @Blackclaws,

Again, we should not define our own specification but try to make use of the existing. JPEG/PNG images have no built-in support in HDF5, so the opaque dtype is the most appropriate since it is meant to store arbitrary binary data.

The alternative is to use a tool like HDF View or some code to convert your JPEG/PNG images into HDF5 RGB images (i.e. datasets with CLASS="IMAGE" and SUBCLASS="IMAGE_TRUECOLOR"), which H5Web already supports (cf. /nexus_entry/rgb-image in the mock demo).

Gotcha, I think the request would then boil down to optionally allowing the RAW visualization also for datasets where other visualizations are available. The problem that I originally faced has been solved pretty quick by the PureHDF author but in general I think the point stands that displaying binary data with the RAW visualization should have a different trigger than just it being unreadable otherwise.

axelboc commented 2 months ago

Hmm, I see. Not sure I like the idea of always displaying the Raw tab, though. Maybe a configuration option? But this depends on how H5Web is used (myHDF5, VS Code, Jupyter, @h5web/app directly ...) Now that I mention it, how do you use H5Web?

Another problem with having the Raw visualization always available somehow is that JPEG/PNG images still won't work without the opaque dtype. That's because the opaque dtype hints at H5Web to fetch the dataset as binary instead of JSON (which is useful with nested compound datasets, for instance). Of course, we could reconsider fetching as JSON by default in the Raw visualization.

Blackclaws commented 2 months ago

Hmm, I see. Not sure I like the idea of always displaying the Raw tab, though. Maybe a configuration option? But this depends on how H5Web is used (myHDF5, VS Code, Jupyter, @h5web/app directly ...) Now that I mention it, how do you use H5Web?

Another problem with having the Raw visualization always available somehow is that JPEG/PNG images still won't work without the opaque dtype. That's because the opaque dtype hints at H5Web to fetch the dataset as binary instead of JSON (which is useful with nested compound datasets, for instance). Of course, we could reconsider fetching as JSON by default in the Raw visualization.

Right now two basic ways. One is the VsCode extension another is embedded in a web interface for a test station producing h5 results.

Hmm I see so opaque is always needed. The problem with storing images as just RGB is that they become prohibitively large for bigger resolutions (>20Mb per Image) while the corresponding png/jpg is just 1-3Mb.

Would it be an option to just probe any dataset for the magic bytes at the beginning and offer a "Raw Image" tab where it would show what the data looks like when interpreted as an image by the browser?

axelboc commented 2 months ago

Would it be an option to just probe any dataset for the magic bytes at the beginning and offer a "Raw Image" tab where it would show what the data looks like when interpreted as an image by the browser?

Unfortunately no, sorry, this would not be very efficient with network-based data providers and would require modifying our data provider API as well as the back-end providers themselves... Support for raw JPEG/PNG images is more of a convenience than a core feature, so we're really not keen on making big changes for it. We are quite set on the opaque dataset solution until a more standardised solution comes in, either from HDF5 itself or from one of the many HDF5-based formats.

The problem with storing images as just RGB is that they become prohibitively large for bigger resolutions (>20Mb per Image) while the corresponding png/jpg is just 1-3Mb.

Concerning storing large images as RGB, I do understand that size is a concern but perhaps compression can help. I think that the hdf5_plugin project includes a couple of JPEG compression filters, so, no guarantees but you might already be able to open compressed RGB JPEG datasets in H5Web with an h5grove back-end (for instance with jupyterlab-h5web). With an h5wasm back-end (myHDF5 or VS Code), you would need to open an issue on the h5wasm-plugins repository to request for one of the JPEG filters to be added.

Of course, there's the writing part to take care of in PureHDF, and hdf5_plugin won't help you with that...


I'm closing the issue but feel free to open a discussion thread to share updates if you decide to investigate compression, or to keep the discussion going regarding alternative handling of raw binary images.

t20100 commented 2 months ago

HDF5 defines datatypes and the HDF5 documentation states that the opaque type is for "Uninterpreted data" (https://docs.hdfgroup.org/hdf5/develop/_h5_t__u_g.html) as opposed to integers, floats, characters and bitfield. So a jpeg encoded payload lands into this "Uninterpreted data" category.

Also h5py documentation uses the opaque datatype to store this kind of data.

There is however a jpeg compression filter (repository) that is part of the registered HDF5 compression filters. Unfortunately it is not yet available in h5wasm as @axelboc mentioned.

loichuder commented 1 month ago

Anyway, from what I got, the issue was solved on PureHDF's side. Thanks @Apollo3zehn for adding support of opaque datasets.