Closed yamajackr closed 2 years ago
Hi @yamamotoryo, this is expected since the correction step uses median sequencing depth for generating counts (effectively downdepths, genes that are lowly expressed overall will tend to have more zeroes (this is independent of PrepSCTFindMarkers which will minimum of the median sequencing depth across models to generate corrected counts). But as we show in the paper, using corrected counts results in overall higher sensitivity & specificity.
library(Seurat)
library(SeuratData)
data("pbmc3k")
pbmc3k <- SCTransform(pbmc3k)
> sum(GetAssayData(pbmc3k, assay = "RNA", slot="counts")["CD4",]==0)
[1] 2386
> sum(GetAssayData(pbmc3k, assay = "SCT", slot="counts")["CD4",]==0)
[1] 2390
Thank you so much @saketkc !
Hi Christoph, I appreciate all your work. I integrated two datasets using RPCA-based method after scTransform normalization. One dataset is UMI-based data, and the other is data obtained by Smart-seq2. For Smart-seq2 data, I changed the data distribution using a quantile normalization method (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02078-0). I read the paper for scTransform v2, and tried DE analysis using prepSCTFindMarkers function. I noticed that the fraction of zero values had increased a lot for some genes after prepSCTFindMarkers(). Are these expected things? Shouldn't I care about it? Many thanks in advance.