Closed gwaybio closed 6 years ago
The pull request stores the framework to loop over different combinations of latent space (z) dimensionality.
The most important script in this pull request is train_models_given_z.py
. This script will ingest configuration files and a given dimensionality, fits pca
, ica
, nmf
, ADAGE
, and Tybalt
models on pancancer RNAseq data a given number of iterations, and outputs the results of several evaluations and also data.
Note that the evaluations presented in this script are immediate. i.e. they are contingent upon and measure fitting iterations across seeds
. Therefore, these evaluations measure stability of solutions across iterations. These evaluations include:
The determinant of correlation matrices indicate the density of correlation across either latent space components or weight matrix features.
The script also outputs data for additional post-hoc analyses. Some of these post-hoc analyes @jaclyn-taroni and I have discussed already. The data include:
Sorry for the influx of PR review requests... They should slow down a bit after this one
Contingent upon results of #116 - but is the logic that will submit several jobs to PMACS training various compression algorithms with a decreasingly constrained bottleneck. Several intermediate results will be saved for post-hoc analyses.
~Note that several methods in
train_models_given_z.py
have not yet been implemented. These are the scripts that process and aggregate results across models.~:point_up: Update - these are now implemented