lheald2 / gued

Python package for analyzing gas-phase ultrafast electron diffraction collected at SLAC MeV-UED Facility
MIT License
0 stars 0 forks source link

gued package

Written by Lauren F Heald Email: lheald2@unl.edu

About:

This package was created for data processing and analysis for experiments conducted at the MeV-UED facility at the Linac Coherent Light Source at the Stanford Linear Accelerator. The main packages, gued.py and gued_theory.py contain all the relevant functions for data processing, and each Jupyter Notebook serves a specific function for cleaning, processing, and saving image based scattering data. This package is developed and maintained by Lauren F. Heald, PhD.

Current Status:

The package has been rigorously tested with multiple data sets collected at the MeV-UED facility. However, updates are posted often as lessons are learned and new tricks are implemented. Please note, this package is meant to serve as a backbone for data processing, but further noise reduction and analysis are encouraged. If you have questions or concerns, email Lauren Heald at lheald2@unl.edu with subject line "GUED Help".

Current Functionality

Different notebooks within the repository serve different purposes but the general data processing pipeline is outlined below (as found in Fast_Analysis.py).

  1. Import all images

    • Function called gued.get_image_details is used to load in all .tif files in the specified folder of interest. The function returns the images as a 3D data array, a 1D array of stage positions, file order, and total counts per image.
  2. Average Based on Stage Positions

    • Group all images based on the stage positions in order to speed up data processing steps for large data sets
  3. Find Centers

    • Find diffraction center for all images using the function called gued.find_centers_pool which runs the find_center_alg function in parallel
  4. Reject images with bad total counts

    • Function called gued.remove_counts loads in the returns from gued.get_image_details and removes any images based on their total counts then returns the inputs with bad images removed.
  5. Subtract background

    • The function called gued.remove_background_pool takes in a 3D array containing all image files and runs the hidden function _remove_background which creates a background image based on the corners of the original image. Then can either return the interpolated backgrounds or the cleaned data.
    • In cases where background images are taken as part of the experiment, use the subtract_background function with the data array and an average background image.
  6. Remove outlier instances of identical pixels

    • This is generally referred to as removing x-ray hits or hot pixels. When working with large data sets, use the gued.remove_xrays_pool function. This function takes the 3D data array and runs the hidden function _remove_xrays in parallel. The function looks for instances of outlier pixels with respect to the average pixel value for all data. Returns the original data array with hot pixel values replaced with np.nan.
  7. Mask detector hole

    • The function gued.apply_mask uses the gued.mask_generator_alg function to create a mask of np.nan values based on center and set radius. Returns masked data. Has the capability to apply multiple masks.
  8. Calculate diffraction center

    • The function gued.find_center_pool runs the function gued.find_center_alg in parallel to find the center of all images. The pool function speeds up the process significantly but with small data sets can run gued.find_center_alg directly.
  9. Remove radial outliers

    • The function gued.remove_radial_outliers_pool uses the hidden function gued._preprocess_radial_data which converts the data to polar coordinates, creates an interpolated average image from radial averages, then looks for instances of radial outliers and replaces them with np.nan.
    • This is by far the most time-consuming part of data processing. Only do this with small data sets (i.e., after stage averaging) unless you're willing to spend a long time processing data. Takes 10 minutes per 100 images running in parallel.
  10. Fill Missing Values (built in to median filter)

    • gued.fill_missing can be used to replace NaN values with the radial average for that detector position. This helps remove artifacts that could be caused by median filtering with NaN values present. This functionality is still being tested.
  11. Apply median filter

    • The function gued.median_filter applies a median filter to the data. Must replace np.nan values with radial average so this function is done in concert with the radial outlier removal (often not necessary and occasionally buggy. Still working on it).
  12. Retrieve Azimuthal Average

    • The function gued.azimuthal_average takes the 3D data array and returns the azimuthal average for each data set.
  13. Plot Pump/Probe Results

    • Following the azimuthal average calculations, generate a plot of the time resolved data for visualization.
  14. Apply Polynomial Fit

    • The function gued.poly_fit is used to apply a polynomial fit (with adjustable order) to correct any baseline offsets.
  15. Save Data

    • The gued.save_data function can be used to save a dictionary of important results to a .h5 for further processing.

Additional notebooks are included for other key purposes and are discussed below. Additionally, some functions are written in the gued.py and the gued_theory.py files but are not currently in use in any notebooks.

Usage:

The first step when using this package at a MeV-UED experiment is to use Set_up_Globals.ipynb file to test and optimize the global variables stored in gued_globals.py. These variables need to be adjusted for each experiment. This notebook uses the average of all the data and plots examples to see if your variables are set properly. Once the global variables are set, move on to the Fast_Analysis.ipynb notebook for processing pump/probe data.

Example of a `gued_globals.py` file:

# Variable for reading files
SEPARATORS = ['-', '_']

# Variables for Center Finding Algorithm
CENTER_GUESS = (460, 460)
RADIUS_GUESS = 35
DISK_RADIUS = 3
THRESHOLD = 150 # When average data, set to 0

# Variable for Generating Background
CORNER_RADIUS = 20
CHECK_NUMBER = 50

# Variables for Masking
MASK_CENTER = [475, 475]
MASK_RADIUS = 40
ADDED_MASK = [[440, 435, 30], [460, 450, 30]]

# Used throughout code as the threshold for cutting out date. This is the default value but other values can be set for the functions using
# std_factor = 4
STD_FACTOR = 3

# Specifies the maximum number of workers to be used when running concurrent.futures
MAX_PROCESSORS = 6

# Adjust figure size 
FIGSIZE = (12,4)

# Path for Theory Package
PATH_DCS = 'gued_package\\GUED_Analysis\\packages\\dcs_repositiory\\3.7MeV\\'

An example notebook named Fast_Analysis.ipynb should be run as the second step in the data processing. This notebook applies and plots all the above steps after having averaged based on the stage position associated with the data. This notebook will get you to the ΔI/I.

Another useful notebook is the Tracking_LabTime.ipynb notebook which allows for visualization of experimental drifts (i.e., center drifts) by grouping images based on acquisition time.

Once the global variables are set, it is possible to run all the functions above on a large data set using the process_all.py file. This file interatively processes images following the above steps (without averaging based on stage position) and saves the total scattering and stage positions to an h5 file. Running 2000 images takes ~ 25 minutes on a personal laptop.

After processing all of the images and saving to an .h5 file, it can be useful to check drifts with respect to lab time. An example of tracking drifts in t0 with respect to labtime is done in the T0_Analysis.ipynb notebook. The data is broken up into groups and the rise time is fit to the different subsets of data to look for changes due to drifts during the data collection.

Finally, after data has been thoroughly cleaned and processed, the .h5 file can be read into the PDF_Generation.ipynb notebook to convert the ΔI/I to the pair distribution function (PDF).

Another notebook that will likely be helpful is the GUED_Simulations.ipynb notebook which can be used to simulate scattering data from input structure files such as .xyz and .csv files. Additionally, can simulate time resolved diffraction patterns from trajectory files and vibrational .hess files generated through ORCA.

Citation

If you're relying heavily on this package, please consider citing us following the citation style for open sources packages following the example below:
Heald, L.F. (2024) GUED (Version 1.0.0) [Computer Software] Github Repository. https://github.com/lheald2/gued See LICENSE.md for more information.

Acknowledgements:

Code was written and adapted by Lauren F. Heald with assistance from multiple sources including:
Caidan Moore (Case Western University)
Cuong Le (University of Nebraska - Lincoln)
Yusong Liu (Stanford Linear Accelerator)
Keke Chen (Tsinghua University)

Additionally, the entire Centurion group at the University of Nebraska - Lincoln and the Stanford National Accelerator Laboratory - MeV-UED Facility Staff offered advice and guidance throughout the development.

Relevant Literature

If you're interested in learning more about gas-phase ultrafast electron diffraction, consider reading the following

Dependencies:

numpy scipy matplotlib pandas tifffile skimage concurrent h5py glob functools *