silx-kit / h5web

React components for data visualization and exploration
https://h5web.panosc.eu/
MIT License
187 stars 19 forks source link

HDF5 Dimension Scales #1313

Open axelboc opened 1 year ago

axelboc commented 1 year ago

It would be nice to support HDF5's dimension scales. We've received multiple feature requests relating to those, including in a couple of recent emails. Dimension scales are apparently used quite extensively in NetCDF4 files, to describe how to plot datasets (axes, units, etc.).

In his email, Jean-Christophe describes, for instance, how to create a time-based dimension scale (which could then be "attached" to a dataset - e.g. values.dims[0].attach_scale(time)).

df_modified = pd.to_datetime(df.index.values) - time # Difference of time from a given date
df_modified_str = df_modified.total_seconds().to_numpy()
time_dset = group.create_dataset ('time', data=df_modified_str)
time_dset.attrs["long_name"] = "UTC Time"
time_dset.attrs["description"] = ModelHDFLevel4Lumina.TIME.__doc__    
time_dset.attrs["calendar"] = "standard"
time_dset.attrs["units"] = f"seconds since {time.strftime('%Y-%m-%d %H:%M:%S')}"
time_dset.make_scale('time')

More reading:

axelboc commented 12 months ago

h5wasm@0.6.9 brings support for reading dimension scales.

zhqrbitee commented 2 weeks ago

Hi teams, we have some applications that stores a waveform (time array + value array) and usually the time array is not uniform sampled to reduce waveform size. It would be nice to have this feature in if we want to visualize the waveform.

axelboc commented 1 week ago

@zhqrbitee any chance you could share a sample file with us?

NAThompson commented 1 week ago

@axelboc : Here is code which produces an HDF5 file which uses a dimension scale to link two datasets, giving the requisitie metadata to matplotlib so that it understand the datastructure is (times, values) and should be plotted as such:

#!/usr/bin/env python3

import h5py
import numpy
import matplotlib.pyplot as plt
from math import pi as π

def chirp(t: float):
    f0 = 1e4
    c = 3e8
    φ0 = 0.0
    return numpy.sin(φ0 + 2 * π * (c * t * t / 2 + f0 * t))

def create_nonuniform_timeseries():
    # N.B.: This is to emulate a more realistic goal (plotting the output of an adaptive ODE stepper)
    # without a huge amount of code:
    times = numpy.random.uniform(0.0, 1e-3/2, 10000)
    times = numpy.sort(times)
    values = chirp(times)

    with h5py.File('chirp.h5', 'w') as f:
        # Create the time dataset and add dimension scale and units
        time_ds = f.create_dataset('times', data=times)
        time_ds.attrs['units'] = 'seconds'
        time_ds.make_scale('times')

        # Create the values dataset, and attach the time dataset as its dimension scale
        values_ds = f.create_dataset('values', data=values)
        values_ds.attrs['units'] = 'dimensionless'
        values_ds.dims[0].attach_scale(time_ds)

        print(f"Created 'chirp.h5' with times and values datasets.")

def read_and_plot_timeseries(filename='chirp.h5'):
    with h5py.File(filename, 'r') as f:
        # Find the 'values' dataset and check its dimension scale
        values_ds = f['values']
        values = values_ds[:]

        # Iterate through attached scales to find the time dataset
        scales = values_ds.dims[0]
        for scale in scales:
            time_ds = f[scale]  # Get the time dataset by its name
            times = time_ds[:]  # Read the time values

            # Now plot the data
            plt.figure(figsize=(10, 6))
            plt.plot(times, values, label='Chirp Signal')
            plt.xlabel(f'Time ({time_ds.attrs['units']})')
            plt.ylabel(f'Values ({values_ds.attrs['units']})')
            plt.title('Chirp Signal vs. Time')
            plt.grid(True)
            plt.legend()
            plt.tight_layout()
            plt.show()
            break
        else:
            print("No dimension scale found for the 'values' dataset.")

if __name__ == '__main__':
    create_nonuniform_timeseries()
    read_and_plot_timeseries()

Running h5dump chirp.h5 demonstrates that this file does indeed have the desired metadata:

times
...
...
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_U32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 105553182819040 "/values",
               0
            }
         }
      }
...
values:
...
...
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT } }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 105553182834848 "/times")
         }
      }