greenelab / mpmp

Multimodal Pan-cancer Mutation Prediction
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

COSMIC CGC gene set analysis, part 1 #75

Closed jjc2718 closed 2 years ago

jjc2718 commented 2 years ago

PR description:

For our paper in review, both reviewers were interested in seeing results for a larger set of cancer genes (greenelab/mpmp-manuscript#43), with one of them pointing out the COSMIC Cancer Gene Census and its larger set of DNA damage repair genes compared to the Vogelstein et al. gene set we've been using.

This PR lays the foundation for a more in-depth analysis of this gene set. So far I've only run the "all data types" comparison for these genes, and I'll likely rerun all of our analyses in a future PR and combine this gene set with the Vogelstein genes based on the results here.

With the Vogelstein gene set, which is considerably smaller, we saw generally similar performance for expression and methylation. When we expand to the COSMIC gene set, this seems to favor gene expression:

image

And counting the number of "well-predicted" genes at a p-value cutoff of 0.001 (these numbers were pretty similar between data types for the Vogelstein genes):

Screen Shot 2022-02-19 at 4 16 53 PM

Code changes: