stemangiola / tidybulk

Brings bulk and pseudobulk transcriptomics to the tidyverse
https://stemangiola.github.io/tidybulk/
165 stars 25 forks source link

turn pseudobulk from scRNAseq with logCPM value back to a seurat object with counts value? #285

Closed jiangzh-coder closed 1 year ago

jiangzh-coder commented 1 year ago

May i ask 1 question:

if data is already pseudobulk object from scRNAseq data with logCPM value, how can i change it back to a seurat object with counts value? Can i still use above method to turn data back to a seurat object?

My data is normalized to become a pseudobulk data as following: "Normalizing count data After excluding poor quality cells, we normalized the sequencing depth of each cell by dividing each cell’s counts by the total counts in that cell, resulting in a matrix where the entries represent the proportion of a cell’s reads allocated to each gene (i.e. values in the range [0,1]). To estimate a library size for each dataset, we summed the total counts in each cell, and then we took the median as the library size for the dataset. Next, we multiplied the proportions by the library size to get a count matrix that was normalized for sequencing depth. Finally, we transformed the normalized count matrix with log2(1 + count). We referred to this log-transformed quantity in the figures as log2CPM.

We created pseudobulk expression (L. Lun, Bach, and Marioni 2016) for the cells in a cluster for each donor such that the pseudobulk matrix had one row for each gene and one column for each cluster from each patient. We normalized the pseudobulk counts to log2CPM as described in the previous section. Then we use limma::lmFit() to test for differential gene expression with the log2CPM pseudobulk matrix (Ritchie et al. 2015). We also use presto::wilcoxauc() to compute the area under receiver operator curve (AUROC or AUC) for the log2CPM value of each gene as a predictor of the cluster membership for each cluster (Korsunsky, Nathan, et al. 2019). " thanks best wishes J.

jiangzh-coder commented 1 year ago

How can i assign cell subtype to cluster defined by this pseudobulk matrix with logCPM value, after subset major celltypes defined by this pseudobulk matrix? Could i use markers expression profile in different clusters to redefine cell subtypes? thanks lot! best wishes,jiang

stemangiola commented 1 year ago

if data is already pseudobulk object from scRNAseq data with logCPM value, how can i change it back to a seurat object with counts value? Can i still use above method to turn data back to a seurat object?

Unfortunately, both cell aggregation (pseudobulk) and scaling (CPM) delete information from the data. So it is not possible to go back.

How can i assign cell subtype to cluster defined by this pseudobulk matrix with logCPM value, after subset major celltypes defined by this pseudobulk matrix? Could i use markers expression profile in different clusters to redefine cell subtypes? thanks lot! best wishes,jiang

You could use cibersort on the cell-type clusters to define the most likely cell type. Have a look to tidybulk README about deconvolution.