h2020charisma / ramanchada2

A library for Raman spectroscopy harmonization
https://h2020charisma.github.io/ramanchada2/
MIT License
4 stars 3 forks source link

reorganise .cha file for minimum NeXus compliance (to enable visualisation) #147

Open vedina opened 2 months ago

vedina commented 2 months ago
import h5py
import numpy as np

def write_cha2(x, y, metadata, dataset_name="raw", filename='nexus_compliant_spectrum_file.h5'):
    """
    Create a NeXus compliant HDF5 file from given x, y arrays and metadata.

    :param x: numpy array for the x-axis data (e.g., wavenumbers).
    :param y: numpy array for the y-axis data (e.g., counts).
    :param metadata: dictionary containing metadata information.
    :param dataset_name: dataset name, e.g. "raw" , "processed"
    :param filename: The name of the output NeXus file.
    """
    # Create a new NeXus compliant HDF5 file (write mode)
    with h5py.File(filename, 'w') as nexus_file:

        # Create the root NXentry group
        nxentry = nexus_file.create_group('entry')
        nxentry.attrs['NX_class'] = 'NXentry'

        # Add the raw data group
        nxdata_raw = nxentry.create_group(dataset_name)
        nxdata_raw.attrs['NX_class'] = 'NXdata'

        # Add x and y datasets
        nxdata_raw.create_dataset('x', data=x)
        nxdata_raw.create_dataset('y', data=y)
        nxdata_raw['x'].attrs['units'] = "cm-1"
        nxdata_raw['y'].attrs['units'] = "a.u."
        nxdata_raw.attrs['signal'] = 'y'
        nxdata_raw.attrs['axes'] = 'x'

        # Add metadata
        for key, value in metadata.items():
            nxentry.attrs[key] = value

# Example usage
x = np.linspace(0, 10, 100)  # Example x data 
y = np.sin(x)  # Example y data (e.g., counts)

# Metadata dictionary example
metadata = {
    'title': 'Sample Spectrum Data',
    ...
}

# Create the NeXus file
write_cha2(x, y, metadata, "raw")
write_cha2(x, y, metadata, "processed")
vedina commented 2 months ago

@georgievgeorgi this is a proposal for .cha v2 (in addition to NeXus file generated using https://github.com/ideaconsult/pyambit )

The main change is there is a top level group (entry, name could be anything) and then there could be multiple NXdata groups, then the actual data as h5 datasets.

This shall enable visualisation in h5web. Also will be possible to have multiple spectra in the same file (different top level entries)

Question - can we fit the cache within this new structure ?

georgievgeorgi commented 2 months ago

The cache is based on nested datasets. The name of the dataset represents the applied processings over the parent data. This way we support branching. If the NeXus format supports nested datasets, the rest is doable, i believe.

vedina commented 2 months ago

NeXus is just hdf5 file with some conventions for the structure. there is no convention for nested datasets , but we could propose one .