lcls-users / btx

BeamTime with X-rays - miscellaneous functions for aiding analysis during LCLS experiments.
https://lcls-users.github.io/btx/
Other
1 stars 12 forks source link

Added V-tracking and visualization methods for PiPCA #334

Closed russell-marasigan closed 11 months ago

russell-marasigan commented 1 year ago

Changes

V-tracking in PiPCA

Algorithm used to update V

Let Ui, Si, and Vii be the matrices obtained from iPCA to model batch Xi. Also, let Ui+1 and Si+1 be the matrices obtained from the parallel QR algorithm for the next batch Xi+1. We can find the next Vi+1i+1 with the standard SVD assumption that Xi+1 = Ui+1 Si+1 Vi+1i+1. And $\forall j \in [0, i]$ we can update the previous Vji+1 batch by defining it to be the matrix that satisfies Xj = Ui+1 Si+1 Vji+1. Once we have all Vji+1 and Vi+1i+1, we can obtain the overall Vi+1 by simply concatenating the Vji+1 and Vi+1i+1. So, Vi+1 = [V0i+1 V1i+1 ... Vii+1 Vi+1i+1].

Recording loadings

Added pipca_visuals.py script to btx/misc

Note: pipca.py now saves the final model to an h5 file that pipca_visuals.py pulls from to display the dashboard and eigenimages

Dashboard changes

Here is the output with PC20 selected:

image

As well as the updated scree plot and reconstructed PiPCA image with PC3 selected:

image

Display eigenimages

image

How-To use these visualization functions

These tutorials assume access to SLAC unix servers through the pslogin portal.

Running PiPCA in Jupyter Notebook

Given a small enough number of images, we can easily run PiPCA in a Jupyter Notebook. First, we import the functions from pipca.py and pipca_visuals.py in the btx repository. Be sure to append your system's correct path to btx, instead of the one shown below.

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *
from btx.misc.pipca_visuals import *

Next, we create our relevant parameter variables that will be used to initialize our pipca object.

exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M

q = 10 # number of principal components to compute and maintain
n = 200 # total number of images to be incorporated into model
m = 50 # size of image batch incorporated in each model update
start_offset = 24 # run index of first image to be incorporated into iPCA model

Now, we can initialize our pipca object and run PiPCA. This may take a few minutes since Jupyter will only be running on one rank with the MPI Communicator.

pipca = PiPCA(exp=exp, run=run, det_type=det_type, start_offset=start_offset, num_components=q, batch_size=m, num_images=n)

pipca.run()

While running, the following should be printed.

Factoring 50 samples into 0 sample, 10 component model...
Factoring 50 samples into 50 sample, 10 component model...
Factoring 50 samples into 100 sample, 10 component model...
Factoring 50 samples into 150 sample, 10 component model...
Model complete
Model saved to pipca_model.h5

Once the model is complete and saved to pipca_model.h5, we can display the dashboard. Note: You can also display the eigenimages with display_eigenimages('pipca_model.h5').

display_dashboard('pipca_model.h5')

image

Running PiPCA with SLURM job

In order to take advantage of the parallelization of PiPCA, utilizing the MPI Communicator and running it on multiple CPUs, we need to run it using SLURM jobs. We will need three separate scripts to run PiPCA and display the dashboard.

The first is a bash script that will allocate the nodes, tasks, and CPUs for the job. For this tutorial, my file name will be testing_pipca.sh:

#!/bin/bash
#SBATCH --account=lcls
#SBATCH --partition=milano
#SBATCH --job-name=testing-pipca-dashboard
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#SBATCH --nodes=2
#SBATCH --ntasks=120
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4G

source /sdf/group/lcls/ds/ana/sw/conda1/manage/bin/psconda.sh
mpirun python run_pipca.py

The second is the python script that the previous bash script will run on multiple ranks in parallel, eventually saving the final model to pipca_model.h5. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be run_pipca.py:

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *

exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M

q = 20 # number of principal components to compute and maintain
n = 2000 # total number of images to be incorporated into model
m = 125 # size of image batch incorporated in each model update
start_offset = 0 # run index of first image to be incorporated into iPCA model

pipca = PiPCA(exp=exp, run=run, det_type=det_type, start_offset=start_offset, num_components=q, batch_size=m, num_images=n)

pipca.run()

And the third script is another python script that will be ran continuously in order to take advantage of the interactive features of the dashboard. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be serve_pipca_dashboard.py:

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.misc.pipca_visuals import *

display_dashboard('pipca_model.h5')

Once we've created these three scripts, we can use the command line to run PiPCA. We start by ssh-ing to the psana cluster:

ssh psana

Next, we source the psana conda environment:

source /sdf/group/lcls/ds/ana/sw/conda1/manage/bin/psconda.sh

Then, we can run our bash script with sbatch:

sbatch testing_pipca.sh

Once the model is complete and saved to pipca_model.h5, we can serve the dashboard panel application. Note: You can also display the eigenimages by serving a python script that calls display_eigenimages('pipca_model.h5') instead.

panel serve serve_pipca_dashboard.py --autoreload --show

image

Future Actions