silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
160 stars 17 forks source link

Test visualisations for datasets with NaN #641

Open loichuder opened 3 years ago

loichuder commented 3 years ago

Working on #639 made me wonder if h5web is really robust for datasets containing NaN.

I think we should test the fetching and visualizations for:

loichuder commented 3 years ago

This poses an issue at fetching: NaN and Infinityare not supported by the JSON format. There are several ways of deal with such values.

The default behaviour or the json Python serializer is to replace Infinity and NaN by their JS equivalent: https://docs.python.org/3/library/json.html#json.dump. Unfortunately, this produces invalid JSON as in HSDS: https://github.com/HDFGroup/hsds/issues/87.

A common workaround is to serialize Infinity and NaN as null. This is the approach currently in use in jupyterlab_hdf and orjson. While the JSON is valid, we cannot distinguish between NaN and Infinity in h5web. This also means that strictly speaking, the typing of our ndarray<number> is not valid as we could get ndarray<number | null> instead. I started some work on https://github.com/silx-kit/h5web/pull/642 but we wish to instead support NaN and Infinity explicitly and be able to distinguish the two.

The only approach I see to enable this is to serialize NaN and Infinity as strings as suggested in https://github.com/jupyterlab/jupyterlab-hdf5/issues/22 (proposed implementation in https://github.com/jupyterlab/jupyterlab-hdf5/pull/97). Note the use of orjson (that looks very promising for performance reasons) excludes this approach (https://github.com/jupyterlab/jupyterlab-hdf5/pull/98).

loichuder commented 3 years ago

A common workaround is to serialize Infinity and NaN as null. This is the approach currently in use in jupyterlab_hdf and orjson.

Update: this behaviour was implemented in HSDS when using ignore_nan: https://github.com/HDFGroup/hsds/issues/87

loichuder commented 2 years ago

Here is the current status for our three providers:

HSDS

When fetching

When displaying

The biggest issue here in my opinion is that NaN/Infinity are not explicitly handled in the domain computation. Implementing this specific handling could solve the LineVis and HeatmapVis issues.

Jupyter/h5grove

When fetching

When displaying

Issues stems from the fact that null is not expected in the typings nor in the numeric contexts. As a short-term fix, we could add a transformation in the Provider from null to NaN to leverage the NaN handling we will implement for HSDS. The long-term fix would be to request data in binary rather than JSON.

loichuder commented 2 years ago

HSDS

775 fixed

The only remaining issue is ScalarVis shows 0 for these values but after investigation, this is a general issue of HSDS for numeric scalar datasets: https://github.com/HDFGroup/hsds/issues/100.

Jupyter/h5grove

Issues were not fixed. The implementation #776 that transforms null in NaN implied a dent on performance that was too important.

axelboc commented 2 years ago

Pasting this from #817:

h5grove

When fetching

When displaying

axelboc commented 2 years ago

I think the next step would be to allow fetching attribute values as binary in H5Grove.

The current API endpoint of H5Grove returns a JSON dictionary, even when fetching a single attribute, so we would need to change this. We could of course add a new endpoint that returns the value of a single attribute as binary, but a better solution may be to keep the current endpoint and its batching capability, and use the multipart/* MIME type.

This page gives an example of a multipart/form-data response that includes both JSON and binary data: https://blog.marcinbudny.com/2014/02/sending-binary-data-along-with-rest-api.html

loichuder commented 2 years ago

Note that https://github.com/HDFGroup/hsds/issues/100 is now fixed so that NaN/Infinity support is complete in HSDS !

At a price at a replaceAll in the Provider: https://github.com/silx-kit/h5web/blob/6ea9269fa06cb84dcd864f3307958db7c63710ad/src/h5web/providers/hsds/hsds-api.ts#L65

Only the support of NaN/Infinity for attributes in h5grove is left: I am curious to try multipart/form-data !

axelboc commented 2 years ago

Excellent!! Do we then have to pass a reviver to the JSON.parse() call to transform "NaN" and "(-)Infinity" strings back to JS "numbers"?

loichuder commented 2 years ago

Excellent!! Do we then have to pass a reviver to the JSON.parse() call to transform "NaN" and "(-)Infinity" strings back to JS "numbers"?

It sort of works when keeping these as strings but it perhaps would be cleaner indeed