greenelab / tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
BSD 3-Clause "New" or "Revised" License
162 stars 61 forks source link

Pathway Coverage Evaluation #90

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

A good indicator of "how much biology did the model learn" is based on a pathway coverage test. I need to see how this was done in the eADAGE paper and then perform the pathway coverage on Pan-Cancer data for dimensionality reduction algorithms: PCA, ICA, NMF, ADAGE, and Tybalt.

Is there a better pathway database for cancer? Perhaps something like KEGG cancer?

gwaybio commented 6 years ago

Steps:

  1. Predefined pathway gene sets (KEGG, GO-BP, etc.)
  2. Assign features to gene sets based on high weight genes
  3. Do this with and without cross-talk correction (which will assign single gene per pathway per feature overrepresentation analysis)
  4. Determine coverage percentage for the full model

This will require:

  1. Determining pathway sets
  2. Extracting high weight genes for each dimensionality reduction algorithm listed above (see #69)
  3. Perform cross talk correction
gwaybio commented 6 years ago

As discovered in #95 - I will need to perform pathway coverage tests for high weight genes defined by standard and dynamic procedures.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.