silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
165 stars 17 forks source link

Convert all bigints to number in `H5WasmProvider` without exceptions #1581

Closed axelboc closed 4 months ago

axelboc commented 4 months ago

Fix #1536

RawVis calls JSON.stringify(), which doesn't know how to serialize bigints.

The existential question is: should RawVis support bigints (perhaps by implementing BigInt.prototype.toJSON), or is RawVis not meant to receive bigints at all (like all the other visualizations)?

The ideal long-term goal is for all the visualizations to support bigints, as this would allow showing exact bingint values notably in the scalar and matrix visualizations and in the tooltip of the line and heatmap visualizations.

For the time being, however, I'm more concerned about the fact that H5WasmProvider sometimes converts bigints to numbers, and sometimes doesn't. This is what I decided to fix in this PR.

It only happens inside "non-printable" compound datasets (i.e. containing fields with non-trivial dtypes, like nested compounds). We already had a nested compound dataset in sample.h5, but it didn't contain bigints. I added bigints to it to show that in this situation, bigints were indeed not converted to numbers. (You can't see it in the diff, since that's what this PR is fixing).

This logic was implemented on purpose in H5WasmProvider: whenever we encountered a bigint array or a printable compound with at least one bigint field, we would call h5wDataset.to_array(), which returns a JSON-safe value notably by converting any bigints to numbers and any typed arrays to plain JS arrays. The rest of the time (so for nested compounds), we would just return h5wDataset.value (or slice()), which don't perform any JSON-related conversions.

Anyway, so I broadened this logic to cover all possible datasets containing bigints.

Instead of calling to_array, I now always read the raw value with h5wDataset.value (or slice()) and perform the bigint conversion myself. This has the advantage of not requiring re-flattening or manual slicing, and ensures that only bigints are converted; typed arrays, notably, remain untouched. I'm hoping that this will be more future proof. Also, my goal is to later move this bigint conversion closer to the visualizations, where needed, as the providers should return values that are as true to the content of the HDF5 as possible.

bmaranville commented 4 months ago

At the moment, h5wasm is reading datasets into the same dtype as they were stored. There is nothing requiring this - when h5wasm reads the values a different output type could be specified, e.g. read uint64 into uint32 etc.

You can see the documentation for supported conversions here: https://docs.hdfgroup.org/hdf5/develop/_h5_t__u_g.html#title10

Would this be important for performance reasons, for the h5wasm provider? If so, please submit an issue to the h5wasm repo.

loichuder commented 4 months ago

Would this be important for performance reasons, for the h5wasm provider?

I think our ultimate goal is to support BigInt in H5Web (there is no reason we should not) so that this conversion is only temporary.

But thanks for chiming in :wink:

axelboc commented 4 months ago

At the moment, h5wasm is reading datasets into the same dtype as they were stored.

This is great! I think that providers should return the data as accurately as possible; it should then be up to each visualization to convert it when needed (the heatmap's shader, for instance, works almost only with float32 arrays). So I do want to return bigints from the providers eventually and move the conversion to numbers further down, in the relevant visualizations.

This PR is just a first step to fix the linked issue, replace to_array() with value, and bring the bigint-to-number conversion logic into H5Web.