Open loichuder opened 3 years ago
This poses an issue at fetching: NaN
and Infinity
are not supported by the JSON format. There are several ways of deal with such values.
The default behaviour or the json
Python serializer is to replace Infinity
and NaN
by their JS equivalent: https://docs.python.org/3/library/json.html#json.dump. Unfortunately, this produces invalid JSON as in HSDS: https://github.com/HDFGroup/hsds/issues/87.
A common workaround is to serialize Infinity
and NaN
as null
. This is the approach currently in use in jupyterlab_hdf
and orjson
. While the JSON is valid, we cannot distinguish between NaN
and Infinity
in h5web. This also means that strictly speaking, the typing of our ndarray<number>
is not valid as we could get ndarray<number | null>
instead. I started some work on https://github.com/silx-kit/h5web/pull/642 but we wish to instead support NaN
and Infinity
explicitly and be able to distinguish the two.
The only approach I see to enable this is to serialize NaN
and Infinity
as strings as suggested in https://github.com/jupyterlab/jupyterlab-hdf5/issues/22 (proposed implementation in https://github.com/jupyterlab/jupyterlab-hdf5/pull/97). Note the use of orjson
(that looks very promising for performance reasons) excludes this approach (https://github.com/jupyterlab/jupyterlab-hdf5/pull/98).
A common workaround is to serialize Infinity and NaN as null. This is the approach currently in use in jupyterlab_hdf and orjson.
Update: this behaviour was implemented in HSDS when using ignore_nan
: https://github.com/HDFGroup/hsds/issues/87
Here is the current status for our three providers:
ignore_nan
parameter, we receive invalid JSON containing NaN
and (-)Infinity
Provider
by replacing the invalid values by their string counterpart "NaN"
and "(-)Infinity"
: https://github.com/silx-kit/h5web/blob/6ea9269fa06cb84dcd864f3307958db7c63710ad/src/h5web/providers/hsds/hsds-api.ts#L65MatrixVis
and display of attributes work fineHeatmapVis
works for custom domainsScalarVis
shows 0
for these valuesLineVis
show a blank canvas even for slices containing only valid valuesHeatmapVis
yields NaN
or Infinity
valuesThe biggest issue here in my opinion is that NaN/Infinity
are not explicitly handled in the domain computation. Implementing this specific handling could solve the LineVis
and HeatmapVis
issues.
NaN
and (-)Infinity
are serialized as null
Provider
LineVis
do not show points for these valuesnull
null
value gets interpreted as 0
in most numeric contexts: value display in the tooltip, domain computation, colormap mapping, MatrixVis
...ScalarVis
throws with val is null
Issues stems from the fact that null
is not expected in the typings nor in the numeric contexts. As a short-term fix, we could add a transformation in the Provider
from null
to NaN
to leverage the NaN
handling we will implement for HSDS. The long-term fix would be to request data in binary rather than JSON.
The only remaining issue is ScalarVis shows 0 for these values but after investigation, this is a general issue of HSDS for numeric scalar datasets: https://github.com/HDFGroup/hsds/issues/100.
Issues were not fixed. The implementation #776 that transforms null
in NaN
implied a dent on performance that was too important.
Pasting this from #817:
NaN
and (-)Infinity
values are conserved in the payload and interpreted as their JS counterpart in the ProviderLineVis
do not show points for these valuesMatrixVis
, ScalarVis
and the tooltip display the correct value NaN
/Infinity
null
as the binary format is only for dataset valuesI think the next step would be to allow fetching attribute values as binary in H5Grove.
The current API endpoint of H5Grove returns a JSON dictionary, even when fetching a single attribute, so we would need to change this. We could of course add a new endpoint that returns the value of a single attribute as binary, but a better solution may be to keep the current endpoint and its batching capability, and use the multipart/*
MIME type.
This page gives an example of a multipart/form-data
response that includes both JSON and binary data: https://blog.marcinbudny.com/2014/02/sending-binary-data-along-with-rest-api.html
Note that https://github.com/HDFGroup/hsds/issues/100 is now fixed so that NaN
/Infinity
support is complete in HSDS
!
At a price at a replaceAll
in the Provider: https://github.com/silx-kit/h5web/blob/6ea9269fa06cb84dcd864f3307958db7c63710ad/src/h5web/providers/hsds/hsds-api.ts#L65
Only the support of NaN
/Infinity
for attributes in h5grove is left: I am curious to try multipart/form-data
!
Excellent!! Do we then have to pass a reviver to the JSON.parse()
call to transform "NaN"
and "(-)Infinity"
strings back to JS "numbers"?
Excellent!! Do we then have to pass a reviver to the
JSON.parse()
call to transform"NaN"
and"(-)Infinity"
strings back to JS "numbers"?
It sort of works when keeping these as strings but it perhaps would be cleaner indeed
Working on #639 made me wonder if h5web is really robust for datasets containing NaN.
I think we should test the fetching and visualizations for: