[ ] Initiated new pipca object attributes batch_indices, batch_number, V, U_prev, and S_prev in order to facilitated the updating algorithm for V
[ ] Added the distribute_images_over_batches function to properly format the batch indices based on the given batch sizes of a run
[ ] Added the update_V function to properly update V based on the previous model and the updated U and S
[ ] Added V to the return list of the get_model() function
[ ] Incorporated V into the prime_model() function
Algorithm used to update V
Let Ui, Si, and Vii be the matrices obtained from iPCA to model batch Xi.
Also, let Ui+1 and Si+1 be the matrices obtained from the parallel QR algorithm for the next batch Xi+1.
We can find the next Vi+1i+1 with the standard SVD assumption that Xi+1 = Ui+1 Si+1 Vi+1i+1.
And $\forall j \in [0, i]$ we can update the previous Vji+1 batch by defining it to be the matrix that satisfies Xj = Ui+1 Si+1 Vji+1.
Once we have all Vji+1 and Vi+1i+1, we can obtain the overall Vi+1 by simply concatenating the Vji+1 and Vi+1i+1.
So, Vi+1 = [V0i+1 V1i+1 ... Vii+1 Vi+1i+1].
Recording loadings
[ ] Implemented a new method of recording PC loadings in record_loadings which now runs after the model is updated at each image batch, rather than before the model is updated
[ ] Uses full Sj and Vj matrices from model (where j is the current batch number) to compute PC loadings, instead of Xj and Uj
[ ] Allows previously recorded loadings to be updated based on latest available model
Added pipca_visuals.py script to btx/misc
Note: pipca.py now saves the final model to an h5 file that pipca_visuals.py pulls from to display the dashboard and eigenimages
Dashboard changes
[ ] Added a second row to the dashboard that includes a selector widget, scree plot, and and the reconstructed heatmap image
[ ] The selector widget allows the user to choose a principal component cut-off which will update both the scree plot and the reconstructed PiPCA image
[ ] The scree plot will update to display only the singular values up to the component cut-off
[ ] The heatmap will recalculate and update the reconstructed PiPCA image in order to see how much variation is lost in a specific truncation
Here is the output with PC20 selected:
As well as the updated scree plot and reconstructed PiPCA image with PC3 selected:
Display eigenimages
[ ] The display_eigenimages function displays a principle component selector widget and a heatmap displaying the eigenimage corresponding to the selected component
[ ] The color scale is symmetric centered at 0 to always show the positive and negative regions of the eigenimage
How-To use these visualization functions
These tutorials assume access to SLAC unix servers through the pslogin portal.
Running PiPCA in Jupyter Notebook
Given a small enough number of images, we can easily run PiPCA in a Jupyter Notebook. First, we import the functions from pipca.py and pipca_visuals.py in the btx repository. Be sure to append your system's correct path to btx, instead of the one shown below.
import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *
from btx.misc.pipca_visuals import *
Next, we create our relevant parameter variables that will be used to initialize our pipca object.
exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M
q = 10 # number of principal components to compute and maintain
n = 200 # total number of images to be incorporated into model
m = 50 # size of image batch incorporated in each model update
start_offset = 24 # run index of first image to be incorporated into iPCA model
Now, we can initialize our pipca object and run PiPCA. This may take a few minutes since Jupyter will only be running on one rank with the MPI Communicator.
Factoring 50 samples into 0 sample, 10 component model...
Factoring 50 samples into 50 sample, 10 component model...
Factoring 50 samples into 100 sample, 10 component model...
Factoring 50 samples into 150 sample, 10 component model...
Model complete
Model saved to pipca_model.h5
Once the model is complete and saved to pipca_model.h5, we can display the dashboard.
Note: You can also display the eigenimages with display_eigenimages('pipca_model.h5').
display_dashboard('pipca_model.h5')
Running PiPCA with SLURM job
In order to take advantage of the parallelization of PiPCA, utilizing the MPI Communicator and running it on multiple CPUs, we need to run it using SLURM jobs. We will need three separate scripts to run PiPCA and display the dashboard.
The first is a bash script that will allocate the nodes, tasks, and CPUs for the job. For this tutorial, my file name will be testing_pipca.sh:
The second is the python script that the previous bash script will run on multiple ranks in parallel, eventually saving the final model to pipca_model.h5. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be run_pipca.py:
import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.processing.pipca import *
exp = 'mfxp23120' # experiment name
run = 90 #run number
det_type = 'epix10k2M' # detector name, e.g epix10k2M or jungfrau4M
q = 20 # number of principal components to compute and maintain
n = 2000 # total number of images to be incorporated into model
m = 125 # size of image batch incorporated in each model update
start_offset = 0 # run index of first image to be incorporated into iPCA model
pipca = PiPCA(exp=exp, run=run, det_type=det_type, start_offset=start_offset, num_components=q, batch_size=m, num_images=n)
pipca.run()
And the third script is another python script that will be ran continuously in order to take advantage of the interactive features of the dashboard. Be sure to append your system's correct path to btx, instead of the one shown below. For this tutorial, my file name will be serve_pipca_dashboard.py:
import sys
sys.path.append("/sdf/home/m/marasign/btx")
from btx.misc.pipca_visuals import *
display_dashboard('pipca_model.h5')
Once we've created these three scripts, we can use the command line to run PiPCA. We start by ssh-ing to the psana cluster:
Once the model is complete and saved to pipca_model.h5, we can serve the dashboard panel application.
Note: You can also display the eigenimages by serving a python script that calls display_eigenimages('pipca_model.h5') instead.
Changes
V-tracking in PiPCA
distribute_images_over_batches
function to properly format the batch indices based on the given batch sizes of a runupdate_V
function to properly update V based on the previous model and the updated U and SAlgorithm used to update V
Let Ui, Si, and Vii be the matrices obtained from iPCA to model batch Xi. Also, let Ui+1 and Si+1 be the matrices obtained from the parallel QR algorithm for the next batch Xi+1. We can find the next Vi+1i+1 with the standard SVD assumption that Xi+1 = Ui+1 Si+1 Vi+1i+1. And $\forall j \in [0, i]$ we can update the previous Vji+1 batch by defining it to be the matrix that satisfies Xj = Ui+1 Si+1 Vji+1. Once we have all Vji+1 and Vi+1i+1, we can obtain the overall Vi+1 by simply concatenating the Vji+1 and Vi+1i+1. So, Vi+1 = [V0i+1 V1i+1 ... Vii+1 Vi+1i+1].
Recording loadings
record_loadings
which now runs after the model is updated at each image batch, rather than before the model is updatedAdded
pipca_visuals.py
script tobtx/misc
Note:
pipca.py
now saves the final model to an h5 file thatpipca_visuals.py
pulls from to display the dashboard and eigenimagesDashboard changes
Here is the output with PC20 selected:
As well as the updated scree plot and reconstructed PiPCA image with PC3 selected:
Display eigenimages
display_eigenimages
function displays a principle component selector widget and a heatmap displaying the eigenimage corresponding to the selected componentHow-To use these visualization functions
These tutorials assume access to SLAC unix servers through the pslogin portal.
Running PiPCA in Jupyter Notebook
Given a small enough number of images, we can easily run PiPCA in a Jupyter Notebook. First, we import the functions from
pipca.py
andpipca_visuals.py
in thebtx
repository. Be sure to append your system's correct path tobtx
, instead of the one shown below.Next, we create our relevant parameter variables that will be used to initialize our
pipca
object.Now, we can initialize our
pipca
object and run PiPCA. This may take a few minutes since Jupyter will only be running on one rank with the MPI Communicator.While running, the following should be printed.
Once the model is complete and saved to
pipca_model.h5
, we can display the dashboard. Note: You can also display the eigenimages withdisplay_eigenimages('pipca_model.h5')
.Running PiPCA with SLURM job
In order to take advantage of the parallelization of PiPCA, utilizing the MPI Communicator and running it on multiple CPUs, we need to run it using SLURM jobs. We will need three separate scripts to run PiPCA and display the dashboard.
The first is a bash script that will allocate the nodes, tasks, and CPUs for the job. For this tutorial, my file name will be
testing_pipca.sh
:The second is the python script that the previous bash script will run on multiple ranks in parallel, eventually saving the final model to
pipca_model.h5
. Be sure to append your system's correct path tobtx
, instead of the one shown below. For this tutorial, my file name will berun_pipca.py
:And the third script is another python script that will be ran continuously in order to take advantage of the interactive features of the dashboard. Be sure to append your system's correct path to
btx
, instead of the one shown below. For this tutorial, my file name will beserve_pipca_dashboard.py
:Once we've created these three scripts, we can use the command line to run PiPCA. We start by ssh-ing to the psana cluster:
Next, we source the psana conda environment:
Then, we can run our bash script with
sbatch
:Once the model is complete and saved to
pipca_model.h5
, we can serve the dashboard panel application. Note: You can also display the eigenimages by serving a python script that callsdisplay_eigenimages('pipca_model.h5')
instead.Future Actions
pipca_model.h5
log
object variable to PiPCA which computes the model on a logarithmic scale.pipca_visuls.py
functions to be applicable to other data reduction output models.