Confused about using "sum" for pseudobulk DESeq2

hbctraining / scRNA-seq_online

493 stars 175 forks source link

Hi, Thanks for your online lessons. In pseudobulk_DESeq2_scrnaseq.md part, I noticed "sum" function used for cluster-sample groups.

I am very confused about this step. Why not use "mean" function? I think cells number per samples would be very unbalanced for sampling or experimental reasons. For example, sample A may have 2000/5000 CD8T cells, while samle B may have 1000/2500 CD8T cells. So whether "sum" function would inflate the differentiation between pseudobulk groups?

# Aggregate across cluster-sample groups
pb <- aggregate.Matrix(t(counts(sce)), 
                       groupings = groups, fun = "sum")

hbctraining / scRNA-seq_online

Confused about using "sum" for pseudobulk DESeq2 #74