ndevenish / miniapp

0 stars 1 forks source link

Support uint32_t input data #3

Open graeme-winter opened 5 months ago

graeme-winter commented 5 months ago

Could clobber it down to 16 bits for the calculation but a lot of example data are recorded with an exposure time with an Eiger which allows 32 bit readout. In the cases I looked at the data would fit into 16 bits so this is mostly a data wrangling question.

Alt: have an implementation which will correctly handle uint32_t input data i.e. using more bits for the accumulators (particularly ∑i^2)

graeme-winter commented 5 months ago

In absence of that, the following script can also be used to truncate example data

import os
import sys

import h5py
import hdf5plugin
import numpy
import tqdm

def truncate(fin, fout):
    assert os.path.exists(fin)
    assert not os.path.exists(fout)

    with h5py.File(fin, "r") as i, h5py.File(fout, "w") as o:
        d = i["data"]
        e = o.create_dataset(
            "data",
            shape=d.shape,
            chunks=d.chunks,
            dtype=numpy.uint16,
            **hdf5plugin.Bitshuffle(),
        )

        # copy attributes
        for k, v in d.attrs.items():
            e.attrs[k] = v

        # copy data
        for j in tqdm.tqdm(range(d.shape[0])):
            e[j, :, :] = d[j, :, :]

if __name__ == "__main__":
    truncate(*sys.argv[1:])

N.B. this does not address subtlety around trusted ranges