(2) Create your own Perturb-Seq data
Prepare a scanpy adata object with
adata.obs dataframe has condition and cell_type columns, where condition is the perturbation name for each cell. Control cells have condition format of ctrl, single perturbation has condition format of A+ctrl or ctrl+A, combination perturbation has condition format of A+B.
adata.var dataframe has gene_name column, where each gene name is the gene symbol.
adata.X stores the post-perturbed gene expression.
According to custom data turorial,
custom data
dataset download: https://zenodo.org/records/7041849/files/AdamsonWeissman2016_GSM2406675_10X001.h5ad?download=1
data
['perturbation', 'read count', 'UMI count', 'tissue_type', 'cell_line', 'cancer', 'disease', perturbation_type', 'celltype', 'organism', 'ncounts', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts']
['ensembl_id', 'ncounts', 'ncells']
processing code
condition should be in type str
adata.obs['condition']=adata.obs['condition'].astype(str)
pert_data.new_data_process(dataset_name = 'AdW1', adata = adata)
ValueError: reference = lymphoblasts_ctrl_1 needs to be one of groupby = ['lymphoblasts_62(mod)+pBA5811+1', 'lymphoblasts*_1', 'lymphoblasts_BHLHE40+pDS258_1+1', 'lymphoblasts_CREB1+pDS269_1+1', 'lymphoblasts_DDIT3+pDS263_1+1', 'lymphoblasts_EP300+pDS268_1+1', 'lymphoblasts_SNAI1+pDS266_1+1', 'lymphoblasts_SPI1+pDS255_1+1', 'lymphoblasts_ZNF326+pDS262_1+1', 'lymphoblasts_nan_1']