Closed MxMstrmn closed 3 years ago
@MxMstrmn I have uploaded the LINCS data set here :
/home/icb/mohammad.lotfollahi/datasets
there are three files:
lincs_full.h5ad
: the full data with ~21k compound and 1.3M samples
-lincs.h5ad
the small one which we prepared for CPA training and according to lincs.ipynb
notebook (also in same folder)NA
remove them. @M0hammadL can you please provide the used model seed for the lincs.ipynb
:
state, args, history = torch.load(
'sweep_lincs_logsigm_model_seed=61_epoch=180.pt',
map_location=torch.device('cpu'))
This pretrained model is not part of the tarball
provided in the FAIR repo.
For reference:
Finally, we obtained 17,051 valid molecules, and these data were split into training (14,051), validation (1,500) and test (1,500) sets
See this file for SMILES addition and rdkit check: https://github.com/theislab/chemical_CPA/blob/chemical-lincs/notebooks/lincs_SMILES.ipynb
that notebook code asks for GSE92742_Broad_LINCS_pert_info.txt. Where could I find that csv file please? added! found on NIH site.
That is the data directory:
home/icb/leon.hetzel/git/CPA_graphs/datasets/