Open jhb1980 opened 3 years ago
That's a good question and I've always wanted to formally test this. More cells is always better, but a more nuanced answer takes into account two more aspects
I've run simulations to get a better idea of how exactly these two factors affect the number of cells needed. This figure sums up the results:
For example, to detect a log2FC of 2 for a gene with mean 0.1 (so going from 0.1 to 0.4, bottom row, third panel), you would need about 100 cells per group. A decrease (negative log2FC, panel above) would be much harder to detect (ca. 80% recovery with 200 cells per group when going from 0.1 to 0.025)
On the other hand, if a gene is absent from one group and then goes up to medium-high (say from 0.001 to 1) even 20 cells will be sufficient.
Notes regarding these results
Brilliant, thank you Christoph!
Hi Christoph,
absolutely phenomenal tool set, I'm really excited by the possibility to run DE testing on the output of sctransform! I was wondering if you could comment on what a sufficient minimal number of cells would ideally be for "robust" DE calling between two groups when working with the implementation of diff_mean_test()?