Closed FTD2018 closed 5 years ago
The vst
function has a parameter min_cells
set to 5 by default. This means that genes that are detected in fewer than 5 cells are not considered during normalization and are not part of the output.
Also, note that the output of vst
is a list. The normalized expression matrix is in position 'y'.
Thanks for your reply! I didn't notice the argument min_cells
in vst function.
Actually this data is the iPSC data(Tung et al., 2017), and I compared the normalization results of scran with sctransform using the plotPCA and plotRLE function in scater R package.
And it seems like the scran performed better than sctransform for this data. I know that there isn't a perfect method for every datasets, but do you have any suggestions for comparing different normalization methods for certain datasets except the plotPCA and plotRLE. There is a R package called scone to address this issue, but it is a little complicated and I haven't tried yet.
Anyway, do you have any suggestions for this issue?
While it is never easy to precisely (and accurately) benchmark single cell workflows, it is possible to do so when there is ground truth for the underlying cell states. Scone as you suggest is also a powerful tool for comparisons w.r.t. technical metrics.
However, a comparison of how different batches separate simply by looking at PC1 and PC2 is not, in my opinion, an effective benchmark.
Aaron Lun makes a compelling case for the inherent limitations of log-transformation (and pseudocount addition) here (we were unaware of this until after submission) https://www.biorxiv.org/content/biorxiv/early/2018/08/31/404962.full.pdf
Through estimating the regularized-NB model, sctransform omits the need for these steps. While we cannot guarantee improved performance on every single dataset, we do observe improved performance of sctransform over log-based normalization schemes (see examples in the manuscript and vignette).
Hi, Chris Thanks for the impressive normalization method in scRNA-seq.
Now I want to use vst function in SingleCellExperinment class, my code as follows:
assay(umi.qc, "sctransform_counts") <- sctransform::vst(assay(umi.qc,"counts"), latent_var = c('log_umi'), do_regularize = TRUE, n_genes = NULL, return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE )
And an error came out:It seems like that my original matrix of assay(umi.qc,"counts") is 14066 X 657, but vst only use the 14063 genes to compute.
Do you know what happened? Does that mean vst will pre-check the imput data and omit some specific genes?
Looking forward your reply!