greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Refactor classification script so data only gets loaded once #7

Closed jjc2718 closed 3 years ago

jjc2718 commented 3 years ago

Currently, experiments work by calling classify_cancer_type.py for each gene/cancer type combination. This script reloads all of the data into memory each time, which takes 5-10 minutes.

Creating classifiers for many genes/cancer types would be a lot faster if the code were refactored to load the data once, then run all combinations subsequently (i.e in a single script).