Closed Famingzhao closed 2 years ago
Hi @Famingzhao,
We followed this tutorial (http://biocworkshops2019.bioconductor.org.s3-website-us-east-1.amazonaws.com/page/muscWorkshop__vignette/, code under the "Aggregation of single-cell to pseudo-bulk data" section), and used the sum function as the tutorial did. However, you could also use other summary statistics, like mean or median.
There are studies that explore the difference in pseudobulk performance when using mean versus sum, and generally results have been comparable. We encourage you to peruse the literature and identify what would work best for you.
Hi, Thanks for your online lessons. In pseudobulk_DESeq2_scrnaseq.md part, I noticed "sum" function used for cluster-sample groups.
I am very confused about this step. Why not use "mean" function? I think cells number per samples would be very unbalanced for sampling or experimental reasons. For example, sample A may have 2000/5000 CD8T cells, while samle B may have 1000/2500 CD8T cells. So whether "sum" function would inflate the differentiation between pseudobulk groups?