theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
88 stars 23 forks source link

Create 3 ood splits that exclude drugs of 3 different pathways #81

Closed MxMstrmn closed 2 years ago

MxMstrmn commented 2 years ago

Ref. #77 The gene alignment is also updated - to macht all genes and the gene subset to lincs genes. For the ood split an additional notebook is added that excludes drugs based on their tanimoto similarity. The output file is called lincs_complete.h5ad.

Ref. #78 The larger gene set is now the default 'trapnell_cpa(_lincs_genes.h5ad' and the subset to lincs genes is still called 'trapnell_cpa_lincs_genes).h5ad'

@siboehm, I excluded less genes in the end (4/3 per pathway) as the sciplex dataset does not contain that many drugs in each pathway. These are, however, true ood drugs - opposed to including them when their were applied in small doses.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

MxMstrmn commented 2 years ago

Re excluding 2 most similar, I checked also the third and fourth most similar drugs, these had a lot less similarity.

siboehm commented 2 years ago

I think this is good. Since the gene ordering in lincs_full_smiles_sciplex_genes.h5ad hasn't been changed (I assume), all currently pretrained models are still good.