Closed abearab closed 5 months ago
For the 1st question, now I can see that some of the scPerturb files are uploaded in TDC dataverse.
closed with https://github.com/mims-harvard/TDC/pull/252 thanks @kexinhuang12345 !
Awesome! Thanks @kexinhuang12345
Hi @kexinhuang12345, as you know ReplogleWeissman2022
study has three datasets.
Currently, as I understand ReplogleWeissman2022_K562_gwps
data is not uploaded. However, I noticed a weird behavior when I tried to load it! I had ReplogleWeissman2022_k562_essential
already downloaded in a path
folder and then I tried loading scperturb_gene_ReplogleWeissman2022_K562_gwps
and noticed it's saying Found local copy...
!
>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets')
Found local copy...
Loading...
Looking at the # of perturbations, it's not true for _gwps
dataset. It should be 9867 but it's 2058 (this is the same number as _essential
dataset)
>>> test_load.adata.obs.perturbation.unique()
Length: 2058
Looking more carefully, I tried an empty folder and noticed for some reason this is downloading wrong file for _gwps
.
>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets/new/')
Downloading...
█████████████████████████████████████████████| 1.55G/1.55G [01:09<00:00, 22.2MiB/s]
Loading...
~: ls Datasets/new/
scperturb_gene_ReplogleWeissman2022_k562_essential.h5ad
cc @amva13
Describe the bug
I'm interested in using single-cell CRISPR perturbation datasets such as
NormanWeissman2019
, andReplogleWeissman2022
datasets.Full list of scPerturb datasets
Questions
TDC
modules to load the scPerturb datasets in Python?Suggestion
For many reasons, it would be nice if the data loader function could enable users to load
h5ad
files asAnnData
objects (at least as an option)._Originally posted in https://github.com/mims-harvard/TDC/pull/236#discussion_r1554845208_
cc @amva13 @kexinhuang12345