Open adamshaylor opened 1 year ago
Sure, I would be open to doing so. What kind of interface did you have in mind? Something like the h5py approach here https://docs.h5py.org/en/stable/high/dims.html?
@bmaranville, great! I read the h5py docs you linked to and shared them with a colleague who’s more familiar with Python than me. I don’t think we’re too picky about the interface. Whatever you think suits the project best will probably be fine with us.
Our primary concern is getting data out of the browser’s JavaScript runtime and into a NetCDF-4 file our users can open in Panoply or QGIS. The way I’ve been testing this approach as a proof of concept is with a little script built on h5wasm that traverses all the attributes and paths of an input file, copies them to a new file, and generates a Blob-based link to download that output. Then we use ncdump
or h5dump
to compare the headers of the input an output and Panoply to try to create a map-based plot. This is the only feature that sticks out to us as obviously missing to get to parity. If there’s any more information I can provide that would inform your approach, e.g. sample data and/or access to the script I mentioned, let me know.
We need also this option. I think the best approach is to make it equal to the hdf5 specification, e.g. set_scale
, attach_scale
and detach_scale
:
https://docs.hdfgroup.org/hdf5/develop/group___h5_d_s.html example: https://docs.hdfgroup.org/archive/support/HDF5/Tutor/h5dimscale.html
@bmaranville Have you had any chance on looking at this feature?
I looked at the underlying HDF5 libraries, and it seems straightforward. Exposing the minimal write/create/modify functions you listed from the HDF5 API would be quick. Would that be useful?
I don't have time to immediately implement a more complete solution (that would allow e.g. reading dimension scales or identifying attached dimension scales).
Yes, that would be great. Current I only need to set dimensions.
Thanks
tor. 2. nov. 2023, 16:01 skrev Brian Benjamin Maranville < @.***>:
I looked at the underlying HDF5 libraries, and it seems straightforward. Exposing the minimal write/create/modify functions you listed from the HDF5 API would be quick. Would that be useful?
I don't have time to immediately implement a more complete solution (that would allow e.g. reading dimension scales or identifying attached dimension scales).
— Reply to this email directly, view it on GitHub https://github.com/usnistgov/h5wasm/issues/60#issuecomment-1790904911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUXMIRKPCRDHIIGQXTCOEJTYCOYWLAVCNFSM6AAAAAA4HMHHMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQHEYDIOJRGE . You are receiving this because you commented.Message ID: @.***>
For the time being, it so happens that we’re really mainly interested in generating NetCDF-4 files, so I think this should work for us, too, thanks.
Ok, write support is in v0.6.8 - please give it a try. Documentation can be found in the CHANGELOG or the release notes
I'll leave this open because basic read support is not yet implemented, and more convenient functions have not been added to the typescript API in hdf5_hl.ts
This is purely from curiosity, but what projects are you working on that will involve writing netcdf4 files with h5wasm?
Hi, I'll test the code tomorrow :-)
We use h5wasm to publish data to IOOS https://ioos.us/ and do not want to use the old node.js lib for netcdf, and need browser support. They support both hdf5 and netcdf, however the file must have dimensions.
Thank you very much, @bmaranville. I will find some time to test next week.
In our case, we (Lobelia) are responsible for EU web applications that allow scientists and policy makers to view and download climate data from the web (for example, the Copernicus Marine MyOcean Viewer). In some of these applications, the data the user wants to download is already present in the browser. Since our users tend to use QGIS, we need to export to NetCDF. Rather than rely on a web service as we do now, we are looking into how we can export from within the browser.
Set and attach scale worked for us :-) Thank you very much
A suggestion is to add these functions into the dataset object so that you do not need to set file_id or dataset name
I was surprised that requests for write functionality came in before requests for read functionality! My reason for getting into the hdf5-in-the-browser game was to support web-based visualization and inspection of web-based datasets. Just in case there is demand for visualizing netcdf4 files, here are the read functions that go along with the write functions: (and the the write functions have been added to the TypeScript API as requested above)
mkdirTree
to Emscripten typescript interface
// convert dataset to dimension scale:
Dataset.make_scale(scale_name: string)
// attach a dimension scale to the "index" dimension of this dataset:
Dataset.attach_scale(index: number, scale_dset_path: string)
// detach a dimension scale from "index" dimension
Dataset.detach_scale(index: number, scale_dset_path: string)
// get full paths to all datasets that are attached as dimension scales
// to the specified dimension (at "index") of this dataset:
Dataset.get_attached_scales(index: number)
// if this dataset is a dimension scale, returns name as string
// (returns empty string if no name defined, but it is a dimension scale)
// else returns null if it is not set as a dimension scale:
Dataset.get_scale_name()
// label dimension at "index" of this dataset with string "label":
Dataset.set_dimension_label(index: number, label: string)
// fetch labels for all dimensions of this dataset (null if label not defined):
Dataset.get_dimension_labels()
I have one further question - the HDF5 mapping spec for NetCDF4 indicates that all groups should be created with link and attribute creation order preserved, and that all datasets should be created with attribute creation order preserved (see https://www.earthdata.nasa.gov/sites/default/files/imported/ESDS-RFC-022v1.pdf)
Currently h5wasm does not have a mechanism for doing this. Is it very important for your work?
For me this is not an issue. The IOOS compliance checker approved the generated files with the new h5wasm. However, I think they accept both hdf5 and NetCDF as long they have dimensions/scales.
@bmaranville I recently learned that preserving the creation order and linking are necessary if NetCDF apps and libs should be able to modify the file. The NetCDF files we created with h5wasm are readable but cannot be modified.
When reviewing the code for h5py:
It seems that h5py do any other processing as this is handled by the c-api of hdf5.
Do you think it would be possible to implement this in h5wasm? We would greatly appreciate it if you could manage this, as we have customers who need to modify NetCDF files.
I implemented this in a branch, but never merged it. I'll make a PR out of it soon. https://github.com/usnistgov/h5wasm/tree/track_order
PR in progress: #82
Currently, there is just one flag track_order
when creating new groups, and it applies to links (fields) as well as attributes. Is there a strong use case for separate flags for tracking the order of attributes and links, or can we leave it as a single flag? Will you ever really want to track the order of only attributes but not links, or only links but not attributes?
Also, there is currently no way to read back whether the track_order
flag was set, in the metadata of the group or dataset. Is that important?
I have only seen them both combined, I think this it the most common way, so a single flag seems good to me.
I do not need to read back track_order, but sure could be useful for some maybe to see what the setting is.
The track_order
flag for Dataset
and Group
creation has been added in the just-published h5wasm v0.7.7
Thank you I'll test this tomorrow :-)
Do you have plans to add dimension scales to h5wasm? If not, would you be open to doing so?
I’ve been tinkering with h5wasm as a possible means of dynamically creating NetCDF-4 files in the browser. Given the similarities between NetCDF-4 and HDF5, h5wasm is the lightest-weight approach I’ve found. (By the way I appreciate the work you’ve done on it! The TypeScript type definitions have made it very easy for me to learn the API quickly.) The one thing I’ve found to be missing for NetCDF-4 support (thus far anyway) is dimension scales. Perhaps you’re aware of others, but this seems to be the most obvious one.
References: