snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
179 stars 39 forks source link

Custom PertData ``new_data_process`` error #62

Open Yonggie opened 3 months ago

Yonggie commented 3 months ago

According to custom data turorial,

(2) Create your own Perturb-Seq data Prepare a scanpy adata object with adata.obs dataframe has condition and cell_type columns, where condition is the perturbation name for each cell. Control cells have condition format of ctrl, single perturbation has condition format of A+ctrl or ctrl+A, combination perturbation has condition format of A+B. adata.var dataframe has gene_name column, where each gene name is the gene symbol. adata.X stores the post-perturbed gene expression.

custom data

dataset download: https://zenodo.org/records/7041849/files/AdamsonWeissman2016_GSM2406675_10X001.h5ad?download=1

data

condition should be in type str

adata.obs['condition']=adata.obs['condition'].astype(str)

pert_data.new_data_process(dataset_name = 'AdW1', adata = adata)


# error:

ValueError: reference = lymphoblasts_ctrl_1 needs to be one of groupby = ['lymphoblasts_62(mod)+pBA5811+1', 'lymphoblasts*_1', 'lymphoblasts_BHLHE40+pDS258_1+1', 'lymphoblasts_CREB1+pDS269_1+1', 'lymphoblasts_DDIT3+pDS263_1+1', 'lymphoblasts_EP300+pDS268_1+1', 'lymphoblasts_SNAI1+pDS266_1+1', 'lymphoblasts_SPI1+pDS255_1+1', 'lymphoblasts_ZNF326+pDS262_1+1', 'lymphoblasts_nan_1']



except for the ``condition, cell_type, gene_name, X``, what else preprocesses shall there be?
bboyrush117 commented 1 month ago

I am also having a similar issue with this dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE216595

Screenshot 2024-06-06 at 1 18 26 PM