Create 3 ood splits that exclude drugs of 3 different pathways

MxMstrmn commented 2 years ago

Ref. #77 The gene alignment is also updated - to macht all genes and the gene subset to lincs genes. For the ood split an additional notebook is added that excludes drugs based on their tanimoto similarity. The output file is called lincs_complete.h5ad.

Ref. #78 The larger gene set is now the default 'trapnell_cpa(_lincs_genes.h5ad' and the subset to lincs genes is still called 'trapnell_cpa_lincs_genes).h5ad'

@siboehm, I excluded less genes in the end (4/3 per pathway) as the sciplex dataset does not contain that many drugs in each pathway. These are, however, true ood drugs - opposed to including them when their were applied in small doses.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MxMstrmn commented 2 years ago

Re excluding 2 most similar, I checked also the third and fourth most similar drugs, these had a lot less similarity.

siboehm commented 2 years ago

I think this is good. Since the gene ordering in lincs_full_smiles_sciplex_genes.h5ad hasn't been changed (I assume), all currently pretrained models are still good.

theislab / chemCPA

Create 3 ood splits that exclude drugs of 3 different pathways #81