usnistgov / h5wasm

A WebAssembly HDF5 reader/writer library
Other
88 stars 12 forks source link

Dimension scales, NetCDF-4 support #60

Open adamshaylor opened 1 year ago

adamshaylor commented 1 year ago

Do you have plans to add dimension scales to h5wasm? If not, would you be open to doing so?

I’ve been tinkering with h5wasm as a possible means of dynamically creating NetCDF-4 files in the browser. Given the similarities between NetCDF-4 and HDF5, h5wasm is the lightest-weight approach I’ve found. (By the way I appreciate the work you’ve done on it! The TypeScript type definitions have made it very easy for me to learn the API quickly.) The one thing I’ve found to be missing for NetCDF-4 support (thus far anyway) is dimension scales. Perhaps you’re aware of others, but this seems to be the most obvious one.

References:

bmaranville commented 1 year ago

Sure, I would be open to doing so. What kind of interface did you have in mind? Something like the h5py approach here https://docs.h5py.org/en/stable/high/dims.html?

adamshaylor commented 1 year ago

@bmaranville, great! I read the h5py docs you linked to and shared them with a colleague who’s more familiar with Python than me. I don’t think we’re too picky about the interface. Whatever you think suits the project best will probably be fine with us.

Our primary concern is getting data out of the browser’s JavaScript runtime and into a NetCDF-4 file our users can open in Panoply or QGIS. The way I’ve been testing this approach as a proof of concept is with a little script built on h5wasm that traverses all the attributes and paths of an input file, copies them to a new file, and generates a Blob-based link to download that output. Then we use ncdump or h5dump to compare the headers of the input an output and Panoply to try to create a map-based plot. This is the only feature that sticks out to us as obviously missing to get to parity. If there’s any more information I can provide that would inform your approach, e.g. sample data and/or access to the script I mentioned, let me know.

bergmorten commented 1 year ago

We need also this option. I think the best approach is to make it equal to the hdf5 specification, e.g. set_scale, attach_scale and detach_scale:

https://docs.hdfgroup.org/hdf5/develop/group___h5_d_s.html example: https://docs.hdfgroup.org/archive/support/HDF5/Tutor/h5dimscale.html

bergmorten commented 1 year ago

@bmaranville Have you had any chance on looking at this feature?

bmaranville commented 1 year ago

I looked at the underlying HDF5 libraries, and it seems straightforward. Exposing the minimal write/create/modify functions you listed from the HDF5 API would be quick. Would that be useful?

I don't have time to immediately implement a more complete solution (that would allow e.g. reading dimension scales or identifying attached dimension scales).

bergmorten commented 1 year ago

Yes, that would be great. Current I only need to set dimensions.

Thanks

tor. 2. nov. 2023, 16:01 skrev Brian Benjamin Maranville < @.***>:

I looked at the underlying HDF5 libraries, and it seems straightforward. Exposing the minimal write/create/modify functions you listed from the HDF5 API would be quick. Would that be useful?

I don't have time to immediately implement a more complete solution (that would allow e.g. reading dimension scales or identifying attached dimension scales).

— Reply to this email directly, view it on GitHub https://github.com/usnistgov/h5wasm/issues/60#issuecomment-1790904911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUXMIRKPCRDHIIGQXTCOEJTYCOYWLAVCNFSM6AAAAAA4HMHHMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQHEYDIOJRGE . You are receiving this because you commented.Message ID: @.***>

adamshaylor commented 1 year ago

For the time being, it so happens that we’re really mainly interested in generating NetCDF-4 files, so I think this should work for us, too, thanks.

bmaranville commented 1 year ago

Ok, write support is in v0.6.8 - please give it a try. Documentation can be found in the CHANGELOG or the release notes

I'll leave this open because basic read support is not yet implemented, and more convenient functions have not been added to the typescript API in hdf5_hl.ts

bmaranville commented 1 year ago

This is purely from curiosity, but what projects are you working on that will involve writing netcdf4 files with h5wasm?

bergmorten commented 1 year ago

Hi, I'll test the code tomorrow :-)

We use h5wasm to publish data to IOOS https://ioos.us/ and do not want to use the old node.js lib for netcdf, and need browser support. They support both hdf5 and netcdf, however the file must have dimensions.

adamshaylor commented 1 year ago

Thank you very much, @bmaranville. I will find some time to test next week.

In our case, we (Lobelia) are responsible for EU web applications that allow scientists and policy makers to view and download climate data from the web (for example, the Copernicus Marine MyOcean Viewer). In some of these applications, the data the user wants to download is already present in the browser. Since our users tend to use QGIS, we need to export to NetCDF. Rather than rely on a web service as we do now, we are looking into how we can export from within the browser.

bergmorten commented 1 year ago

Set and attach scale worked for us :-) Thank you very much

A suggestion is to add these functions into the dataset object so that you do not need to set file_id or dataset name

bmaranville commented 1 year ago

I was surprised that requests for write functionality came in before requests for read functionality! My reason for getting into the hdf5-in-the-browser game was to support web-based visualization and inspection of web-based datasets. Just in case there is demand for visualizing netcdf4 files, here are the read functions that go along with the write functions: (and the the write functions have been added to the TypeScript API as requested above)

v0.6.9 2023-11-06

Fixed

bmaranville commented 1 year ago

I have one further question - the HDF5 mapping spec for NetCDF4 indicates that all groups should be created with link and attribute creation order preserved, and that all datasets should be created with attribute creation order preserved (see https://www.earthdata.nasa.gov/sites/default/files/imported/ESDS-RFC-022v1.pdf)

Currently h5wasm does not have a mechanism for doing this. Is it very important for your work?

bergmorten commented 1 year ago

For me this is not an issue. The IOOS compliance checker approved the generated files with the new h5wasm. However, I think they accept both hdf5 and NetCDF as long they have dimensions/scales.

bergmorten commented 3 months ago

@bmaranville I recently learned that preserving the creation order and linking are necessary if NetCDF apps and libs should be able to modify the file. The NetCDF files we created with h5wasm are readable but cannot be modified.

When reviewing the code for h5py:

It seems that h5py do any other processing as this is handled by the c-api of hdf5.

Do you think it would be possible to implement this in h5wasm? We would greatly appreciate it if you could manage this, as we have customers who need to modify NetCDF files.

bmaranville commented 3 months ago

I implemented this in a branch, but never merged it. I'll make a PR out of it soon. https://github.com/usnistgov/h5wasm/tree/track_order

bmaranville commented 3 months ago

PR in progress: #82 Currently, there is just one flag track_order when creating new groups, and it applies to links (fields) as well as attributes. Is there a strong use case for separate flags for tracking the order of attributes and links, or can we leave it as a single flag? Will you ever really want to track the order of only attributes but not links, or only links but not attributes?

Also, there is currently no way to read back whether the track_order flag was set, in the metadata of the group or dataset. Is that important?

bergmorten commented 3 months ago

I have only seen them both combined, I think this it the most common way, so a single flag seems good to me.

I do not need to read back track_order, but sure could be useful for some maybe to see what the setting is.

bmaranville commented 2 months ago

The track_order flag for Dataset and Group creation has been added in the just-published h5wasm v0.7.7

bergmorten commented 2 months ago

Thank you I'll test this tomorrow :-)