Open bjoernenders opened 6 years ago
This is the approach I use, in abbreviated form:
# Description:
# Python script to prepare a 2D ptychography dataset from individual data files;
# the data file(s) contain 1 image numpy array in uncompressed HDF5 format. The
# output file will be a HDF5 uncompressed .ptyd file containing the data, N frames
# cut to ROI asize with the beam centred.
#
# Input file .ptyd structure. [From PtyPy documentation 2017-02-06]
#
# *.ptyd/
#
# info/
#
# [general parameters; optional]
# auto_center : binary
# center : array (int, int)
# chunk_format : string
# dfile : string (filename)
# distance : scalar (float)
# energy : scalar (float, keV)
# experimentID : string (optional)
# label : string (optional)
# lam : scalar (float)
# load_parallel : string
# min_frames : scalar
# misfit : scalar
# num_frames : scalar (int)
# num_frames_actual : scalar (int)
# orientation : scalar (index)
# origin : array (int, int)
# positions_scan : array (int,2 array)
# positions_theory : array (int,2 array)
# precedence : string (optional)
# propagation : scalar (float, meters)
# psize : array (float, float)
# rebin : None (int, optional)
# recipe : string
# resolution : array (float, float)
# save : None
# shape : array (int, int)
# version : string
#
# meta/
#
# [general parameters; optional but very useful]
# version : str
# num_frames : int
# label : str
#
# [geometric porameters; all optional]
# shape : int or (int,int)
# energy : float, optional
# distance : float, optional
# center : (float,float) or None, optional
# psize : float or (float,float), optional
# propagation : "farfield" or "nearfield", optional
# ...
#
# chunks/
#
# 0/
# data : array(M,N,N) of float
# indices : array(M) of int, optional
# positions : array(M ,2) of float
# weights : same shape as data or empty
# 1/
# ...
# 2/
# ...
# ...
Required Parameters:
sequence_id (integer) : unique id number for the scan
scan_id (integer) : user scan number for the data
Optional parameters:
nimages (integer) : number of frames to prepare
: [default = all frames in directory]
first_frame (integer) : first frame in data series
: [default = 1, for resampled data]
asize (integer) : side-length of square ROI to prepare
: [default = ROI size 128x128 pixels]
cen_x, cen_y (int,int) : (x,y) beam centre in detector frame
: [default = centre will be estimated]
base_dir (string) : experiment directory
: [default = current run/username_EPN]
: [from: SR05ID01IOC51:saveData_subDir]
analysis_dir (string) : subdirectory to output HDF5 file
: [default = 'analysisSXDM']
xy_dir (string) : subdirectory to bluemner CSV file
: [default = 'xy']
group_name (string) : sample group analysis subdirectory
: [default = None]
step_scan (integer) : find positions, not bluemner/metadata
: [default = 0 fly_scan, not step_scan]
resample (integer) : read data resampled onto square array
: [default = 0, read pixirad hex data]
force_analysis (integer) : prepare the data even if file exists
: [default = 0, skip existing data]
Additional optional parameters accepted as key=value (see PtyPy docs):
auto_center, center, chunk_format, dfile, distance, energy,
experimentID, exposure_time, input_suffix, label, lam, load_parallel,
min_frames, misfit, num_frames, num_frames_actual, orientation, origin,
output_suffix, pixels_x, pixels_y, positions_theory, precedence,
propagation, psize, rebin, resolution, save, shape, version
Set data paths
Assemble other metadata required by reconstruction routines, if default values exist:
(
auto_center, center, chunk_format, dfile, distance, energy,
experimentID, exposure_time, input_suffix, label, lam, load_parallel,
min_frames, misfit, num_frames, num_frames_actual, orientation, origin,
output_suffix, pixels_x, pixels_y, positions_theory, precedence,
propagation, psize, rebin, resolution, save, shape, version
) = set_defaults(pars=kwargs)
# begin processing this scan data, checking first <N> files for format and
# centering approximation
# set the datafile naming convention
# check if the data directory exists
# set up the output file name
# check if the output filename exists (i.e., data was already prepared)
# find data/mask files, locate data, find beam centre if necessary, extract
# ROI arrays, return data and fmask
# look for pre-existing valid pixel mask, if not found, all pixels = True
# READ IN DATA: separate the cases of resampled/binned images and raw data
# here the whole dataset is loaded into data and transposed, which
# assumes the resampled data and the valid_pixel_mask were reoriented
# for consistency with the raw data files, and that the frames up to
# first_frame have already been dropped.
# CHECK BEAM CENTRE:
# if necessary, estimate the beam centre using the first frame
skip_blank = 0
while beam is not found, skip_blank += 1, move to next image
# EXTRACT ROI:
# region of interest index in each dimension
# cut data and fmask to ROI in place
# LOAD POSITION DATA:
# WRITE PREPARED DATA FILE:
# create the HDF5 NeXus file, overwrite if exists
f = h5py.File(output_file, 'w')
# create ptypy-compliant output file (HDF5 with defined structure)
# minimally required data and scan_info
meta = f.create_group('meta')
meta.attrs['type'] = 'dict'
dset_out = meta.create_dataset('version', data='Created by '
'prepare_data_2D_pixirad.py')
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('label', data=scan_label)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('experimentID', data=experimentID)
dset_out.attrs['type'] = 'None'
dset_out = meta.create_dataset('num_frames', data=nimages-skip_blank)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('date_processed', data=date_processed)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('shape', data=(asize, asize))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('psize', data=(psize, psize))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('center', data=(asize/2-1, asize/2-1))
dset_out.attrs['type'] = 'array'
dset_out = meta.create_dataset('distance', data=distance)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('energy', data=energy)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('lam', data=lam)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('exposure_time', data=exposure_time)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('initial_ctr', data=initial_ctr)
dset_out.attrs['type'] = 'arraytuple'
dset_out = meta.create_dataset('data_filename', data=output_file)
dset_out.attrs['type'] = 'string'
dset_out = meta.create_dataset('sequence_id', data=sequence_id)
dset_out.attrs['type'] = 'scalar'
dset_out = meta.create_dataset('scan_number', data=scan_number)
dset_out.attrs['type'] = 'scalar'
info = f.create_group('info')
info.attrs['type'] = 'dict'
dset_out = info.create_dataset('auto_center', data=auto_center)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('center', data=center)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('chunk_format', data=chunk_format)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('dfile', data=dfile)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('distance', data=distance)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('energy', data=energy)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('experimentID', data=experimentID)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('label', data=label)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('lam', data=lam)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('load_parallel', data=load_parallel)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('min_frames', data=min_frames)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('misfit', data=misfit)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('num_frames', data=num_frames)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('num_frames_actual', data=nimages-(first_frame+skip_blank-1))
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('orientation', data=orientation)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('origin', data=origin)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('positions_scan', data=positions)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('positions_theory', data=positions_theory)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('precedence', data=precedence)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('propagation', data=propagation)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('psize', data=psize)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('rebin', data=rebin)
dset_out.attrs['type'] = 'scalar'
dset_out = info.create_dataset('resolution', data=resolution)
dset_out.attrs['type'] = 'None'
dset_out = info.create_dataset('save', data=save)
dset_out.attrs['type'] = 'string'
dset_out = info.create_dataset('shape', data=shape)
dset_out.attrs['type'] = 'array'
dset_out = info.create_dataset('version', data=version)
dset_out.attrs['type'] = 'string'
recipe = info.create_group('recipe')
recipe.attrs['type'] = 'dict'
chunks = f.create_group('chunks')
chunks.attrs['dtype'] = 'dict'
chunk00 = chunks.create_group('0')
chunk00.attrs['type'] = 'dict'
dset_out = chunk00.create_dataset('data', data=data.astype(np.int32))
dset_out.attrs['type'] = 'array'
dset_out = chunk00.create_dataset('indices', data=indices)
dset_out.attrs['type'] = 'arraylist'
dset_out = chunk00.create_dataset('positions', data=positions.astype(np.float64))
dset_out.attrs['type'] = 'array'
dset_out = chunk00.create_dataset('weights', data=fmask.astype(np.float64))
dset_out.attrs['type'] = 'array'
# close the output file
f.close()
print('{0}: wrote file: {1}'.format(program, output_file))
Thank you for sharing this Cameron. You could also try to subclass PtyScan to process the raw data.
We need to add a step by step guide, how a .ptyd could be created using h5py.File or h5write. This should extend the tutorial on subclassing PtyScan