theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
97 stars 24 forks source link

how was lincs_trapnell.smiles generated? #141

Closed bhomass closed 8 months ago

bhomass commented 1 year ago

shouldn't this come out of lincs_full_smiles_sciplex_genes.h5ad?

we are looking up RDKIT embedding by matching the smiles index, but when there are discrepancies between lincs_trapnell.smiles and lincs_full_smiles_sciplex_genes.h5ad, we don't know why.

bhomass commented 1 year ago

I recreated linacs_trapnell.smiles using the drug_names_to_once_canon_smiles() method in data.py. Finally exp. init_drug_embedding and exp.train runs through and smiles list lined up.