Closed dflatow closed 1 year ago
Hi @dflatow, thanks for the report.
The Matrix
visualisation is capable of displaying datasets with a Compound
dtype, as long as every field has a "printable" dtype, which H5Web defines as the following: integer, unsigned integer, float, string, boolean, complex. For a demonstration, you can take a look at dataset /nD_datasets/oneD_compound
on H5Web's demo site.
If any of the fields is not printable (which seems to be the case here), H5Web falls back to the "Raw" visualisation, which attempts to serialize the dataset to JSON. Since the dataset seems to contain big integers, JSON.stringify()
throws an error:
Could you please share the raw type of the dataset (click on "Inspect" on the row labelled "Raw")? Or even better, could you share an example file?
I found the same issue. It generally fails if format='table'
is passed to pandas.HDFStore.put()
@axelboc The issue derives from https://github.com/silx-kit/vscode-h5web/issues/15: some part of the compound dataset value has BigInt
that are not serializable.
We solved #15 by converting BigInt
to regular integers when encountering datasets with integer dtypes but forgot that these BigInt
can show up in Compound
datasets, such as the ones generated by pandas
.
It can be reproduced with a file holding a single compound dataset with a field storing int64
(in this case age
):
import numpy as np
import h5py
with h5py.File(...) as h5file:
# From https://numpy.org/doc/stable/user/basics.rec.html
h5file.create_dataset(
"dogs",
data=np.array(
[("Rex", 9, 81.0), ("Fido", 3, 27.0)],
dtype=[("name", "S10"), ("age", "i8"), ("weight", "f4")],
),
)
In the meantime, it is possible to circumvent the issue when saving with pandas
: removing append=True
from the call to to_hdf
saves the columns as separate datasets rather than in a single compound dataset.
H5Web should not have issues viewing these separate datasets.
Should be fixed in the next release. I'll try to get it out asap.
Is your feature request related to a problem?
I'm writing h5 files via pandas (version 2.1.0). Something simple like this:
df.to_hdf(key=key, path_or_buf=path, append=True)
When I go do view the data I get the following:
Requested solution or feature
Would be great to be able to visualize the data. I'm not sure if this is a bug or a feature request.
Alternatives you've considered
Don't see any alternative VScode pluggins to view h5 files.