Open axelboc opened 1 year ago
h5wasm@0.6.9 brings support for reading dimension scales.
Hi teams, we have some applications that stores a waveform (time array + value array) and usually the time array is not uniform sampled to reduce waveform size. It would be nice to have this feature in if we want to visualize the waveform.
@zhqrbitee any chance you could share a sample file with us?
@axelboc : Here is code which produces an HDF5 file which uses a dimension scale to link two datasets, giving the requisitie metadata to matplotlib so that it understand the datastructure is (times, values)
and should be plotted as such:
#!/usr/bin/env python3
import h5py
import numpy
import matplotlib.pyplot as plt
from math import pi as π
def chirp(t: float):
f0 = 1e4
c = 3e8
φ0 = 0.0
return numpy.sin(φ0 + 2 * π * (c * t * t / 2 + f0 * t))
def create_nonuniform_timeseries():
# N.B.: This is to emulate a more realistic goal (plotting the output of an adaptive ODE stepper)
# without a huge amount of code:
times = numpy.random.uniform(0.0, 1e-3/2, 10000)
times = numpy.sort(times)
values = chirp(times)
with h5py.File('chirp.h5', 'w') as f:
# Create the time dataset and add dimension scale and units
time_ds = f.create_dataset('times', data=times)
time_ds.attrs['units'] = 'seconds'
time_ds.make_scale('times')
# Create the values dataset, and attach the time dataset as its dimension scale
values_ds = f.create_dataset('values', data=values)
values_ds.attrs['units'] = 'dimensionless'
values_ds.dims[0].attach_scale(time_ds)
print(f"Created 'chirp.h5' with times and values datasets.")
def read_and_plot_timeseries(filename='chirp.h5'):
with h5py.File(filename, 'r') as f:
# Find the 'values' dataset and check its dimension scale
values_ds = f['values']
values = values_ds[:]
# Iterate through attached scales to find the time dataset
scales = values_ds.dims[0]
for scale in scales:
time_ds = f[scale] # Get the time dataset by its name
times = time_ds[:] # Read the time values
# Now plot the data
plt.figure(figsize=(10, 6))
plt.plot(times, values, label='Chirp Signal')
plt.xlabel(f'Time ({time_ds.attrs['units']})')
plt.ylabel(f'Values ({values_ds.attrs['units']})')
plt.title('Chirp Signal vs. Time')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
break
else:
print("No dimension scale found for the 'values' dataset.")
if __name__ == '__main__':
create_nonuniform_timeseries()
read_and_plot_timeseries()
Running h5dump chirp.h5
demonstrates that this file does indeed have the desired metadata:
times
...
...
ATTRIBUTE "REFERENCE_LIST" {
DATATYPE H5T_COMPOUND {
H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
H5T_STD_U32LE "dimension";
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): {
DATASET 105553182819040 "/values",
0
}
}
}
...
values:
...
...
ATTRIBUTE "DIMENSION_LIST" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT } }
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): (DATASET 105553182834848 "/times")
}
}
It would be nice to support HDF5's dimension scales. We've received multiple feature requests relating to those, including in a couple of recent emails. Dimension scales are apparently used quite extensively in NetCDF4 files, to describe how to plot datasets (axes, units, etc.).
In his email, Jean-Christophe describes, for instance, how to create a time-based dimension scale (which could then be "attached" to a dataset - e.g.
values.dims[0].attach_scale(time)
).More reading:
units
attribute above to indicate that a dimension scale dataset contains relative timestamps.)