Open graeme-winter opened 5 months ago
In absence of that, the following script can also be used to truncate example data
import os
import sys
import h5py
import hdf5plugin
import numpy
import tqdm
def truncate(fin, fout):
assert os.path.exists(fin)
assert not os.path.exists(fout)
with h5py.File(fin, "r") as i, h5py.File(fout, "w") as o:
d = i["data"]
e = o.create_dataset(
"data",
shape=d.shape,
chunks=d.chunks,
dtype=numpy.uint16,
**hdf5plugin.Bitshuffle(),
)
# copy attributes
for k, v in d.attrs.items():
e.attrs[k] = v
# copy data
for j in tqdm.tqdm(range(d.shape[0])):
e[j, :, :] = d[j, :, :]
if __name__ == "__main__":
truncate(*sys.argv[1:])
N.B. this does not address subtlety around trusted ranges
Could clobber it down to 16 bits for the calculation but a lot of example data are recorded with an exposure time with an Eiger which allows 32 bit readout. In the cases I looked at the data would fit into 16 bits so this is mostly a data wrangling question.
Alt: have an implementation which will correctly handle uint32_t input data i.e. using more bits for the accumulators (particularly ∑i^2)