Changes

V-tracking in PiPCA

[ ] Initiated new pipca object attributes batch_indices, batch_number, V, U_prev, and S_prev in order to facilitated the updating algorithm for V
[ ] Added the distribute_images_over_batches function to properly format the batch indices based on the given batch sizes of a run
[ ] Added the update_V function to properly update V based on the previous model and the updated U and S
[ ] Added V to the return list of the get_model() function
[ ] Incorporated V into the prime_model() function

Algorithm used to update V

Let U_i, S_i, and V_iⁱ be the matrices obtained from iPCA to model batch X_i. Also, let U_i+1 and S_i+1 be the matrices obtained from the parallel QR algorithm for the next batch X_i+1. We can find the next V_i+1ⁱ⁺¹ with the standard SVD assumption that X_i+1 = U_i+1 S_i+1 V_i+1ⁱ⁺¹. And $\forall j \in [0, i]$ we can update the previous V_jⁱ⁺¹ batch by defining it to be the matrix that satisfies X_j = U_i+1 S_i+1 V_jⁱ⁺¹. Once we have all V_jⁱ⁺¹ and V_i+1ⁱ⁺¹, we can obtain the overall Vⁱ⁺¹ by simply concatenating the V_jⁱ⁺¹ and V_i+1ⁱ⁺¹. So, Vⁱ⁺¹ = [V₀ⁱ⁺¹ V₁ⁱ⁺¹ ... V_iⁱ⁺¹ V_i+1ⁱ⁺¹].

Recording loadings

[ ] Implemented a new method of recording PC loadings in record_loadings which now runs after the model is updated at each image batch, rather than before the model is updated
[ ] Uses full S_j and V^j matrices from model (where j is the current batch number) to compute PC loadings, instead of X_j and U_j
[ ] Allows previously recorded loadings to be updated based on latest available model

Added `pipca_visuals.py` script to `btx/misc`

Note: pipca.py now saves the final model to an h5 file that pipca_visuals.py pulls from to display the dashboard and eigenimages

Dashboard changes

[ ] Added a second row to the dashboard that includes a selector widget, scree plot, and and the reconstructed heatmap image
[ ] The selector widget allows the user to choose a principal component cut-off which will update both the scree plot and the reconstructed PiPCA image
[ ] The scree plot will update to display only the singular values up to the component cut-off
[ ] The heatmap will recalculate and update the reconstructed PiPCA image in order to see how much variation is lost in a specific truncation

Here is the output with PC20 selected:

As well as the updated scree plot and reconstructed PiPCA image with PC3 selected:

Display eigenimages

[ ] The display_eigenimages function displays a principle component selector widget and a heatmap displaying the eigenimage corresponding to the selected component
[ ] The color scale is symmetric centered at 0 to always show the positive and negative regions of the eigenimage

How-To use these visualization functions

These tutorials assume access to SLAC unix servers through the pslogin portal.

Running PiPCA in Jupyter Notebook

Given a small enough number of images, we can easily run PiPCA in a Jupyter Notebook. First, we import the functions from pipca.py and pipca_visuals.py in the btx repository. Be sure to append your system's correct path to btx, instead of the one shown below.

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *
from btx.misc.pipca_visuals import *

Next, we create our relevant parameter variables that will be used to initialize our pipca object.

exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M

q = 10 # number of principal components to compute and maintain
n = 200 # total number of images to be incorporated into model
m = 50 # size of image batch incorporated in each model update
start_offset = 24 # run index of first image to be incorporated into iPCA model

Now, we can initialize our pipca object and run PiPCA. This may take a few minutes since Jupyter will only be running on one rank with the MPI Communicator.

pipca = PiPCA(exp=exp, run=run, det_type=det_type, start_offset=start_offset, num_components=q, batch_size=m, num_images=n)

pipca.run()

While running, the following should be printed.

Factoring 50 samples into 0 sample, 10 component model...
Factoring 50 samples into 50 sample, 10 component model...
Factoring 50 samples into 100 sample, 10 component model...
Factoring 50 samples into 150 sample, 10 component model...
Model complete
Model saved to pipca_model.h5

Once the model is complete and saved to pipca_model.h5, we can display the dashboard. Note: You can also display the eigenimages with display_eigenimages('pipca_model.h5').

display_dashboard('pipca_model.h5')

Running PiPCA with SLURM job

In order to take advantage of the parallelization of PiPCA, utilizing the MPI Communicator and running it on multiple CPUs, we need to run it using SLURM jobs. We will need three separate scripts to run PiPCA and display the dashboard.

The first is a bash script that will allocate the nodes, tasks, and CPUs for the job. For this tutorial, my file name will be testing_pipca.sh:

#!/bin/bash
#SBATCH --account=lcls
#SBATCH --partition=milano
#SBATCH --job-name=testing-pipca-dashboard
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#SBATCH --nodes=2
#SBATCH --ntasks=120
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4G

source /sdf/group/lcls/ds/ana/sw/conda1/manage/bin/psconda.sh
mpirun python run_pipca.py

The second is the python script that the previous bash script will run on multiple ranks in parallel, eventually saving the final model to pipca_model.h5. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be run_pipca.py:

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *

exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M

q = 20 # number of principal components to compute and maintain
n = 2000 # total number of images to be incorporated into model
m = 125 # size of image batch incorporated in each model update
start_offset = 0 # run index of first image to be incorporated into iPCA model

pipca = PiPCA(exp=exp, run=run, det_type=det_type, start_offset=start_offset, num_components=q, batch_size=m, num_images=n)

pipca.run()

And the third script is another python script that will be ran continuously in order to take advantage of the interactive features of the dashboard. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be serve_pipca_dashboard.py:

import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.misc.pipca_visuals import *

display_dashboard('pipca_model.h5')

Once we've created these three scripts, we can use the command line to run PiPCA. We start by ssh-ing to the psana cluster:

ssh psana

Next, we source the psana conda environment:

source /sdf/group/lcls/ds/ana/sw/conda1/manage/bin/psconda.sh

Then, we can run our bash script with sbatch:

sbatch testing_pipca.sh

Once the model is complete and saved to pipca_model.h5, we can serve the dashboard panel application. Note: You can also display the eigenimages by serving a python script that calls display_eigenimages('pipca_model.h5') instead.

panel serve serve_pipca_dashboard.py --autoreload --show

Future Actions

[ ] Save updated model to h5 file at each image batch instead of only saving the final model
[ ] Further research how to display the PiPCA dashboard while running, using the h5 file that is continually being updated (real-time visualization)
[ ] Make dynamic h5 file name dependent on time or SLURM job number, since it's currently static as pipca_model.h5
[ ] Add a log object variable to PiPCA which computes the model on a logarithmic scale.
[ ] Adjust pipca_visuls.py functions to be applicable to other data reduction output models.

lcls-users / btx

Added V-tracking and visualization methods for PiPCA #334

Changes

V-tracking in PiPCA

Algorithm used to update V

Recording loadings

Added `pipca_visuals.py` script to `btx/misc`

Dashboard changes

Display eigenimages

How-To use these visualization functions

Running PiPCA in Jupyter Notebook

Running PiPCA with SLURM job

Future Actions

lcls-users / btx

Added V-tracking and visualization methods for PiPCA #334

Changes

V-tracking in PiPCA

Algorithm used to update V

Recording loadings

Added pipca_visuals.py script to btx/misc

Dashboard changes

Display eigenimages

How-To use these visualization functions

Running PiPCA in Jupyter Notebook

Running PiPCA with SLURM job

Future Actions

Added `pipca_visuals.py` script to `btx/misc`