Upload LINCS - Githubissues

theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.

https://arxiv.org/abs/2204.13545

MIT License

100 stars 24 forks source link

Closed MxMstrmn closed 3 years ago

MxMstrmn commented 3 years ago

That is the data directory: home/icb/leon.hetzel/git/CPA_graphs/datasets/

M0hammadL commented 3 years ago

@MxMstrmn I have uploaded the LINCS data set here :

/home/icb/mohammad.lotfollahi/datasets

there are three files:

lincs_full.h5ad: the full data with ~21k compound and 1.3M samples -lincs.h5ad the small one which we prepared for CPA training and according to lincs.ipynb notebook (also in same folder)

M0hammadL commented 3 years ago

[x] Prepare the data for CPA,by adding SMILE vectors similar to preprocessing in here, see Methods/preprocessing. The average all cell lines which we don't do. Let's have a similar split as they have for train, validation, test.
[x] for dose, there are some drugs and does which are -666 which is NA remove them.

MxMstrmn commented 3 years ago

@M0hammadL can you please provide the used model seed for the lincs.ipynb:

state, args, history = torch.load(
    'sweep_lincs_logsigm_model_seed=61_epoch=180.pt',
    map_location=torch.device('cpu'))

This pretrained model is not part of the tarball provided in the FAIR repo.

MxMstrmn commented 3 years ago

For reference:

Finally, we obtained 17,051 valid molecules, and these data were split into training (14,051), validation (1,500) and test (1,500) sets

bhomass commented 1 year ago

that notebook code asks for GSE92742_Broad_LINCS_pert_info.txt. Where could I find that csv file please? added! found on NIH site.