If I take an encoded waveform, read it and write it to a new file, it will not be written as encoded. (It will be written as a gzipped array, but this is because LH5Store.write() now defaults to compression.) However, it will still have the attributes associated with encoding ('codec', etc.) This does not seem like behavior that the user would expect. Reading in a data set and then writing it to file should not change how the data is stored.
This is because read decodes the waveform and stores it in an ArrayOfEqualSizedArrays. When write comes to this object, it does not encode it because it is not an ArrayOfEncodedEqualSizedArrays.
import lgdo
import h5py
import numpy as np
store = lgdo.lh5.LH5Store()
input_file = "/home/lv/Documents/uw/l200/l200-p06-r000-phy-20230619T034203Z-tier_raw.lh5"
output_file = "output.lh5"
ch_list = lgdo.lh5.ls(input_file)[2:] # skip FCConfig and OrcaHeader
# copy data
for ch in ch_list:
chobj, _ = store.read(f'{ch}/raw/', input_file)
store.write(chobj, 'raw', output_file, f'{ch}/')
ch = 'ch1027200'
print('load input file with LH5Store')
chobj, _ = store.read(ch+'/raw/', input_file)
print(chobj['waveform_windowed']['values'].attrs)
print(chobj['waveform_windowed']['values'].nda.shape)
print(np.prod(chobj['waveform_windowed']['values'].nda.shape))
print('\nload input file with h5py')
with h5py.File(input_file, mode='r') as f:
print(f[ch]['raw']['waveform_windowed']['values'].attrs.keys())
print(f[ch]['raw']['waveform_windowed']['values'].keys())
print(f[ch]['raw']['waveform_windowed']['values']['encoded_data'].keys())
print(f[ch]['raw']['waveform_windowed']['values']['encoded_data']['flattened_data'].shape)
print(f[ch]['raw']['waveform_windowed']['values']['encoded_data']['flattened_data'].compression)
print('\nload output file with LH5Store')
chobj, _ = store.read(ch+'/raw/', output_file)
print(chobj['waveform_windowed']['values'].attrs)
print(chobj['waveform_windowed']['values'].nda.shape)
print(np.prod(chobj['waveform_windowed']['values'].nda.shape))
print('\nload output file with h5py')
with h5py.File(output_file, mode='r') as f:
print(f[ch]['raw']['waveform_windowed'].keys())
print(f[ch]['raw']['waveform_windowed']['values'].attrs.keys())
print(f[ch]['raw']['waveform_windowed']['values'].shape)
print(f[ch]['raw']['waveform_windowed']['values'].compression)
If I take an encoded waveform, read it and write it to a new file, it will not be written as encoded. (It will be written as a gzipped array, but this is because
LH5Store.write()
now defaults to compression.) However, it will still have the attributes associated with encoding ('codec', etc.) This does not seem like behavior that the user would expect. Reading in a data set and then writing it to file should not change how the data is stored.This is because
read
decodes the waveform and stores it in anArrayOfEqualSizedArrays
. Whenwrite
comes to this object, it does not encode it because it is not anArrayOfEncodedEqualSizedArrays
.gives