greenelab / mpmp

Multimodal Pan-cancer Mutation Prediction
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

Survival prediction using mutation predictions #70

Closed jjc2718 closed 2 years ago

jjc2718 commented 2 years ago

Following #69, we wanted to compare true mutation status as a survival predictor with predicted mutation status as a survival predictor. We generated matrices of logistic regression scores for all genes in the Vogelstein dataset and all samples, and used these as real-valued features in our survival models.

For pan-cancer survival prediction, the predicted mutation status features outperform the true mutation status features pretty clearly:

image

However, for survival prediction in individual cancer types, the story isn't as clear. There are a few cancer types where the predicted mutation features help, but by and large they don't seem to provide a huge improvement.

image

jjc2718 commented 2 years ago

Do you have a hypothesis for the difference between pan-cancer and individual cancer performance? Is it that the model is better able to predict mutations given more data, so the better mutation predictions are more useful across the board? Or something else?

Good question. I'm not completely sure, but it does seem like there's not that much difference between models in general in the pan-cancer case (x-axis above is pretty compressed). So maybe the mutation predictions are picking up on something small but predictive (like tumor stage/grade, or tumor type, etc) that doesn't help as much for the individual cancer types.

What we're hoping the mutation predictions will add is the ability identify tumors that don't have a mutation, but "act like" they have it. There are some well-known examples of this in cancer biology, including "BRCAness" for tumors that have other mutations that emulate BRCA mutations and "Ph-like" leukemias that don't have the Philadelphia chromosome translocation but share functional features and drug susceptibilities. Unfortunately, if we were really capturing this we probably would have seen larger improvements in performance in the individual cancer types too, so it's not what we were hoping for.