satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Calculate counts-per-million from depth-corrected SCTtransformed `v2` counts. #142

Closed fjrossello closed 1 year ago

fjrossello commented 1 year ago

Hi Team,

I am trying to use a mixed effects model (treatment as fixed and donor as random effect) to identify differentially expressed genes between 3 conditions with MAST. According to one of MAST's vignettes "MAST performs best with log-transformed, scale-normalized data that has been thresholded, such as log2(transcripts per million+1)" (See here for details). My question is whether it is sound to use log2 CPMs calculated from counts (counts slot of a SCT assay from a Seurat object) that have been depth-corrected using SCTransform v2 (recorrected using PrepSCTFindMarkers).

Thanks in advance.

Fernando

saketkc commented 1 year ago

I would recommend not scaling the corrected counts to million counts as in my internal tests, it seems to perform poorly (higher number of false positives, at least with the wilcoxon test). But running mast on the data slot (log(corrected counts) where corrected counts use minimum of the median sequencing depths across the datasets) seems reasoanble.

fjrossello commented 1 year ago

Thanks for your prompt reply and advice. Cheers, Fernando