greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Microsatellite instability prediction across cancer types #67

Closed jjc2718 closed 1 year ago

jjc2718 commented 1 year ago

This PR implements microsatellite instability (MSI) status prediction across cancer types, as another prediction problem to study for the LASSO parameter range experiments I've been looking at. TCGA has MSI classifications for COAD, READ, STAD, and UCEC, so these experiments use those 4 cancer types.

In general, lower LASSO penalties (less regularization) tend to work better for this problem, both for performance on the training cancer types and for generalization to unseen cancer types.

image

The script at 10_msi_prediction/download_msi.ipynb was already reviewed as part of the https://github.com/greenelab/mpmp repo, so you don't need to look too closely at it. The other scripts are new, but based heavily on existing scripts used for the mutation prediction experiments.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB