greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

CCLE data download script #51

Closed jjc2718 closed 1 year ago

jjc2718 commented 1 year ago

As the next step for the feature selection work, we want to see if our conclusions generalize to a different dataset. The script 08_cell_line_prediction/download_data.ipynb downloads cell line expression and mutation data from CCLE, as well as information about each cell line such as its cancer type of origin, tissue of origin, etc.

In the download script we're also visualizing the number/proportion of cell lines for each cancer type that have a mutation in the given gene - this will help us set thresholds for including cancer types that make sense for CCLE, since there are far fewer cell lines in CCLE than tumor samples in TCGA. 5 mutated samples and 10% of samples mutated seems to make sense for most genes we've been looking at, so we'll probably go with that.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB