Synthetic Lethality Data and Code

Datasets:

final_X_tcga_processed.hkl: Expression and mutation features for each cell from DepMap 22Q4's OmicsExpressionProteinCodingGenesTPMLogp1.csv, OmicsSomaticMutationsMatrixHotspot.csv, and OmicsSomaticMutationsMatrixDamaging.csv datasets. It is processed so expression features are z-scored and the features for each cell are l2-normalized to 1.
final_X_tcga_raw_unnormalized.hkl: Expression and mutation features for each cell from DepMap 22Q4's OmicsExpressionProteinCodingGenesTPMLogp1.csv, OmicsSomaticMutationsMatrixHotspot.csv, and OmicsSomaticMutationsMatrixDamaging.csv datasets.
CRISPRGeneEffect_processed.hkl: CRISPRGeneEffect.csv from DepMap 22Q4, filtered for cells that we have mutation and expression features for.
Chronos_Combined_predictability_results.csv: Predictability data from DepMap
cancerGeneList.tsv: OncoKB cancer genes (https://www.oncokb.org/cancer-genes)
sample_info.csv: DepMap metadata for cell lines
datasets/tcga_data_processed_figures.hkl: TCGA data downloaded from Xena

Files:

train_and_get_grads.ipynb: Train one kernel regression model per knockout and get feature importances for each KO.
demo.py: Use calculated feature importances to visualize feature importance distributions for a given KO.
generate_figures.ipynb: Generate main text figures

Feel free to direct any questions about the code to caic@mit.edu.

uhlerlab / synthetic_lethality