noamteyssier / adpbulk

pseudobulking on an AnnData object
MIT License
22 stars 1 forks source link

turn pseudobulk from scRNAseq with logCPM value back to a seurat object with counts value? #8

Open jiangzh-coder opened 12 months ago

jiangzh-coder commented 12 months ago

May i ask 1 question:

if data is already pseudobulk object from scRNAseq data with logCPM value, how can i change it back to a seurat object with counts value? Can i still use above method to turn data back to a seurat object?

My data is normalized to become a pseudobulk data as following: "Normalizing count data After excluding poor quality cells, we normalized the sequencing depth of each cell by dividing each cell’s counts by the total counts in that cell, resulting in a matrix where the entries represent the proportion of a cell’s reads allocated to each gene (i.e. values in the range [0,1]). To estimate a library size for each dataset, we summed the total counts in each cell, and then we took the median as the library size for the dataset. Next, we multiplied the proportions by the library size to get a count matrix that was normalized for sequencing depth. Finally, we transformed the normalized count matrix with log2(1 + count). We referred to this log-transformed quantity in the figures as log2CPM.

We created pseudobulk expression (L. Lun, Bach, and Marioni 2016) for the cells in a cluster for each donor such that the pseudobulk matrix had one row for each gene and one column for each cluster from each patient. We normalized the pseudobulk counts to log2CPM as described in the previous section. Then we use limma::lmFit() to test for differential gene expression with the log2CPM pseudobulk matrix (Ritchie et al. 2015). We also use presto::wilcoxauc() to compute the area under receiver operator curve (AUROC or AUC) for the log2CPM value of each gene as a predictor of the cluster membership for each cluster (Korsunsky, Nathan, et al. 2019). " thanks best wishes J.

jiangzh-coder commented 12 months ago

How can i assign cell subtype to cluster defined by this pseudobulk matrix with logCPM value, after subset major celltypes defined by this pseudobulk matrix? Could i use markers expression profile in different clusters to redefine cell subtypes? thanks lot! best wishes,jiang

noamteyssier commented 12 months ago

if data is already pseudobulk object from scRNAseq data with logCPM value, how can i change it back to a seurat object with counts value? Can i still use above method to turn data back to a seurat object?

Hey J,

This is meant to be used in the scanpy ecosystem within python - not to be used with Seurat objects in R. I'm not sure how to do this exactly but you can look at existing tutorials to convert between anndata and seurat.

https://satijalab.org/seurat/articles/conversion_vignette.html

noamteyssier commented 12 months ago

How can i assign cell subtype to cluster defined by this pseudobulk matrix with logCPM value, after subset major celltypes defined by this pseudobulk matrix? Could i use markers expression profile in different clusters to redefine cell subtypes? thanks lot! best wishes,jiang

Hi - sorry I wish I could help with this but I'm not really sure what you're asking. If you have a cluster label for your cell subtype in your adata.obs dataframe then you can just pass in the name of that column to pseudobulk on it