ptycho / ptypy

Ptypy - main repository
Other
36 stars 16 forks source link

Creating a .ptyd file from scratch #114

Open bjoernenders opened 6 years ago

bjoernenders commented 6 years ago

We need to add a step by step guide, how a .ptyd could be created using h5py.File or h5write. This should extend the tutorial on subclassing PtyScan

cmkewish commented 6 years ago

This is the approach I use, in abbreviated form:

# Description:
# Python script to prepare a 2D ptychography dataset from individual data files;
# the data file(s) contain 1 image numpy array in uncompressed HDF5 format. The
# output file will be a HDF5 uncompressed .ptyd file containing the data, N frames
# cut to ROI asize with the beam centred.
#
# Input file .ptyd structure. [From PtyPy documentation 2017-02-06]
#
# *.ptyd/
#
#       info/
#
#           [general parameters; optional]
#           auto_center         : binary
#           center              : array (int, int)
#           chunk_format        : string
#           dfile               : string (filename)
#           distance            : scalar (float)
#           energy              : scalar (float, keV)
#           experimentID        : string (optional)
#           label               : string (optional)
#           lam                 : scalar (float)
#           load_parallel       : string
#           min_frames          : scalar
#           misfit              : scalar
#           num_frames          : scalar (int)
#           num_frames_actual   : scalar (int)
#           orientation         : scalar (index)
#           origin              : array (int, int)
#           positions_scan      : array (int,2 array)
#           positions_theory    : array (int,2 array)
#           precedence          : string (optional)
#           propagation         : scalar (float, meters)
#           psize               : array (float, float)
#           rebin               : None (int, optional)
#           recipe              : string
#           resolution          : array (float, float)
#           save                : None
#           shape               : array (int, int)
#           version             : string
#
#       meta/
#
#           [general parameters; optional but very useful]
#           version     : str
#           num_frames  : int
#           label       : str
#
#           [geometric porameters; all optional]
#           shape       : int or (int,int)
#           energy      : float, optional
#           distance    : float, optional
#           center      : (float,float) or None, optional
#           psize       : float or (float,float), optional
#           propagation : "farfield" or "nearfield", optional
#           ...
#
#       chunks/
#
#           0/
#               data      : array(M,N,N) of float
#               indices   : array(M) of int, optional
#               positions : array(M ,2) of float
#               weights   : same shape as data or empty
#          1/
#               ...
#          2/
#               ...
#          ...

Required Parameters:
        sequence_id           (integer) : unique id number for the scan
        scan_id               (integer) : user scan number for the data

Optional parameters:
        nimages               (integer) : number of frames to prepare
                                        : [default = all frames in directory]
        first_frame           (integer) : first frame in data series
                                        : [default = 1, for resampled data]
        asize                 (integer) : side-length of square ROI to prepare
                                        : [default = ROI size 128x128 pixels]
        cen_x, cen_y          (int,int) : (x,y) beam centre in detector frame
                                        : [default = centre will be estimated]
        base_dir               (string) : experiment directory
                                        : [default = current run/username_EPN]
                                        : [from: SR05ID01IOC51:saveData_subDir]
        analysis_dir           (string) : subdirectory to output HDF5 file
                                        : [default = 'analysisSXDM']
        xy_dir                 (string) : subdirectory to bluemner CSV file
                                        : [default = 'xy']
        group_name             (string) : sample group analysis subdirectory
                                        : [default = None]
        step_scan             (integer) : find positions, not bluemner/metadata
                                        : [default = 0 fly_scan, not step_scan]
        resample              (integer) : read data resampled onto square array
                                        : [default = 0, read pixirad hex data]
        force_analysis        (integer) : prepare the data even if file exists
                                        : [default = 0, skip existing data]

Additional optional parameters accepted as key=value (see PtyPy docs):
        auto_center, center, chunk_format, dfile, distance, energy,
        experimentID, exposure_time, input_suffix, label, lam, load_parallel,
        min_frames, misfit, num_frames, num_frames_actual, orientation, origin,
        output_suffix, pixels_x, pixels_y, positions_theory, precedence,
        propagation, psize, rebin, resolution, save, shape, version

Set data paths
Assemble other metadata required by reconstruction routines, if default values exist:
(
        auto_center, center, chunk_format, dfile, distance, energy,
        experimentID, exposure_time, input_suffix, label, lam, load_parallel,
        min_frames, misfit, num_frames, num_frames_actual, orientation, origin,
        output_suffix, pixels_x, pixels_y, positions_theory, precedence,
        propagation, psize, rebin, resolution, save, shape, version
) = set_defaults(pars=kwargs)

# begin processing this scan data, checking first <N> files for format and
# centering approximation
# set the datafile naming convention

# check if the data directory exists
# set up the output file name
# check if the output filename exists (i.e., data was already prepared)

# find data/mask files, locate data, find beam centre if necessary, extract
# ROI arrays, return data and fmask

# look for pre-existing valid pixel mask, if not found, all pixels = True

# READ IN DATA: separate the cases of resampled/binned images and raw data
# here the whole dataset is loaded into data and transposed, which
# assumes the resampled data and the valid_pixel_mask were reoriented
# for consistency with the raw data files, and that the frames up to
# first_frame have already been dropped.

# CHECK BEAM CENTRE:
# if necessary, estimate the beam centre using the first frame
skip_blank = 0
while beam is not found, skip_blank += 1, move to next image

# EXTRACT ROI:
# region of interest index in each dimension
# cut data and fmask to ROI in place

# LOAD POSITION DATA:

# WRITE PREPARED DATA FILE:
# create the HDF5 NeXus file, overwrite if exists
f = h5py.File(output_file, 'w')

# create ptypy-compliant output file (HDF5 with defined structure)
# minimally required data and scan_info

meta = f.create_group('meta')
meta.attrs['type'] = 'dict'
dset_out = meta.create_dataset('version', data='Created by '
                                   'prepare_data_2D_pixirad.py')
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('label', data=scan_label)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('experimentID', data=experimentID)
dset_out.attrs['type'] = 'None'
dset_out = meta.create_dataset('num_frames', data=nimages-skip_blank)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('date_processed', data=date_processed)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('shape', data=(asize, asize))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('psize', data=(psize, psize))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('center', data=(asize/2-1, asize/2-1))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('distance', data=distance)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('energy', data=energy)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('lam', data=lam)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('exposure_time', data=exposure_time)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('initial_ctr', data=initial_ctr)
dset_out.attrs['type'] = 'arraytuple'
dset_out = meta.create_dataset('data_filename', data=output_file)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('sequence_id', data=sequence_id)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('scan_number', data=scan_number)
dset_out.attrs['type'] = 'scalar'

info = f.create_group('info')
info.attrs['type'] = 'dict'
dset_out = info.create_dataset('auto_center', data=auto_center)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('center', data=center)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('chunk_format', data=chunk_format)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('dfile', data=dfile)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('distance', data=distance)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('energy', data=energy)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('experimentID', data=experimentID)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('label', data=label)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('lam', data=lam)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('load_parallel', data=load_parallel)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('min_frames', data=min_frames)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('misfit', data=misfit)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('num_frames', data=num_frames)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('num_frames_actual', data=nimages-(first_frame+skip_blank-1))
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('orientation', data=orientation)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('origin', data=origin)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('positions_scan', data=positions)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('positions_theory', data=positions_theory)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('precedence', data=precedence)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('propagation', data=propagation)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('psize', data=psize)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('rebin', data=rebin)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('resolution', data=resolution)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('save', data=save)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('shape', data=shape)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('version', data=version)
dset_out.attrs['type'] = 'string'

recipe = info.create_group('recipe')
recipe.attrs['type'] = 'dict'

chunks = f.create_group('chunks')
chunks.attrs['dtype'] = 'dict'

chunk00 = chunks.create_group('0')
chunk00.attrs['type'] = 'dict'
dset_out = chunk00.create_dataset('data', data=data.astype(np.int32))
dset_out.attrs['type'] = 'array'
dset_out = chunk00.create_dataset('indices', data=indices)
dset_out.attrs['type'] = 'arraylist'
dset_out = chunk00.create_dataset('positions', data=positions.astype(np.float64))
dset_out.attrs['type'] = 'array'
dset_out = chunk00.create_dataset('weights', data=fmask.astype(np.float64))
dset_out.attrs['type'] = 'array'

# close the output file
f.close()
print('{0}: wrote file: {1}'.format(program, output_file))
bjoernenders commented 6 years ago

Thank you for sharing this Cameron. You could also try to subclass PtyScan to process the raw data.