mims-harvard / TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
1.01k stars 173 forks source link

bug in loading scPerturb datasets #256

Open abearab opened 6 months ago

abearab commented 6 months ago

Hi @kexinhuang12345, as you know ReplogleWeissman2022 study has three datasets.

image

Currently, as I understand ReplogleWeissman2022_K562_gwps data is not uploaded. However, I noticed a weird behavior when I tried to load it! I had ReplogleWeissman2022_k562_essential already downloaded in a path folder and then I tried loading scperturb_gene_ReplogleWeissman2022_K562_gwps and noticed it's saying Found local copy...!

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets')
Found local copy...
Loading...

Looking at the # of perturbations, it's not true for _gwps dataset. It should be 9867 but it's 2058 (this is the same number as _essential dataset)

>>> test_load.adata.obs.perturbation.unique()

Length: 2058

Looking more carefully, I tried an empty folder and noticed for some reason this is downloading wrong file for _gwps.

>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets/new/')
Downloading...
█████████████████████████████████████████████| 1.55G/1.55G [01:09<00:00, 22.2MiB/s]
Loading...
~: ls Datasets/new/

scperturb_gene_ReplogleWeissman2022_k562_essential.h5ad

cc @amva13

Originally posted by @abearab in https://github.com/mims-harvard/TDC/issues/239#issuecomment-2082088585

abearab commented 6 months ago

@kexinhuang12345 – hi Kexin, I was wondering if you could check this issue. Thanks

kexinhuang12345 commented 6 months ago

Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse. Will fix it after the NeurIPS deadline!

abearab commented 6 months ago

Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse.

I see, that makes sense.

Will fix it after the NeurIPS deadline!

Thanks!