greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Fix cancer type covariate when single cancer is held out #5

Closed jjc2718 closed 3 years ago

jjc2718 commented 3 years ago

See code in pancancer_utilities/tcga_utilities.py, specifically the align_matrices function. This won't work if the test set has a different cancer type composition than the training set, so there needs to be a way to preserve this across datasets.