satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

diff_mean_test() and group sizes #98

Open jhb1980 opened 3 years ago

jhb1980 commented 3 years ago

Hi Christoph,

absolutely phenomenal tool set, I'm really excited by the possibility to run DE testing on the output of sctransform! I was wondering if you could comment on what a sufficient minimal number of cells would ideally be for "robust" DE calling between two groups when working with the implementation of diff_mean_test()?

ChristophH commented 3 years ago

That's a good question and I've always wanted to formally test this. More cells is always better, but a more nuanced answer takes into account two more aspects

  1. The expression level of the gene, i.e. mean UMI counts in group 1
  2. The fold change, i.e. log2(mean_in_group1/mean_in_group2)

I've run simulations to get a better idea of how exactly these two factors affect the number of cells needed. This figure sums up the results:

image

For example, to detect a log2FC of 2 for a gene with mean 0.1 (so going from 0.1 to 0.4, bottom row, third panel), you would need about 100 cells per group. A decrease (negative log2FC, panel above) would be much harder to detect (ca. 80% recovery with 200 cells per group when going from 0.1 to 0.025)

On the other hand, if a gene is absent from one group and then goes up to medium-high (say from 0.001 to 1) even 20 cells will be sufficient.

Notes regarding these results

R notebook

jhb1980 commented 3 years ago

Brilliant, thank you Christoph!