Closed MxMstrmn closed 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Re excluding 2 most similar, I checked also the third and fourth most similar drugs, these had a lot less similarity.
I think this is good. Since the gene ordering in lincs_full_smiles_sciplex_genes.h5ad
hasn't been changed (I assume), all currently pretrained models are still good.
Ref. #77 The gene alignment is also updated - to macht all genes and the gene subset to lincs genes. For the ood split an additional notebook is added that excludes drugs based on their tanimoto similarity. The output file is called
lincs_complete.h5ad
.Ref. #78 The larger gene set is now the default
'trapnell_cpa(_lincs_genes.h5ad'
and the subset to lincs genes is still called'trapnell_cpa_lincs_genes).h5ad'
@siboehm, I excluded less genes in the end (4/3 per pathway) as the sciplex dataset does not contain that many drugs in each pathway. These are, however, true ood drugs - opposed to including them when their were applied in small doses.