greenelab / multi-plier

An unsupervised transfer learning approach for rare disease transcriptomics
BSD 3-Clause "New" or "Revised" License
44 stars 10 forks source link

Are MSigDB oncogenic pathways captured by the recount2 model? #42

Closed jaclyn-taroni closed 6 years ago

jaclyn-taroni commented 6 years ago

Related issue: #38

The latent variables learned by PLIER can capture variability related to biological signals or technical noise. We've framed the former as latent variables that are significantly associated with pathways that we supplied during model training. The remaining latent variables may or may not be representative of a coherent biological process.

We didn't supply any of the models with MSigDB oncogenic pathways (they come with PLIER data("oncogenicPathways"), so they've been held out and we can essentially think of these as "novel to the model." We can check if the LV loadings align with these pathways.

Here I add:

Results: ~76% of the pathways are associated (using that same FDR < 0.05 cutoff), but I'll need to repeat this with a variety of models (see #39).

HTML notebook for easy viewing: 27-oncogenic_pathway_recount2_model.nb.zip