Closed dkobak closed 2 years ago
Okay I see now that this has been answered by Rahul here https://github.com/satijalab/seurat/issues/2414:
How much to clip is an empirical determination. When originally writing vst, we used a simple default of sqrt(N). As we tested more datasets in Seurat, we felt it was helpful to impose a more stringent cutoff.
I am closing this issue.
The default clipping of residuals in
Seurat::SCTransform
appears to besqrt(n_cells/30)
(see https://satijalab.org/seurat/reference/sctransform) and notsqrt(n_cells)
as insctransform::vst
and also as described in the Methods section of Hafemeister & Satija 2019 (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1874-1).The consequence of this is that running
Seurat::SCTransform
on the pbmc33k dataset used in the original paper does not produce the same residuals, and the difference is rather large: a very different set of genes is selected as most variable, compared to what we see in Figure 4C.What is the preferred default value of clipping (and why)? Was the default at some point changed from
sqrt(n_cells)
tosqrt(n_cells/30)
?