satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
210 stars 33 forks source link

Input Error #14

Closed FTD2018 closed 5 years ago

FTD2018 commented 5 years ago

Hi, Chris Thanks for the impressive normalization method in scRNA-seq.

Now I want to use vst function in SingleCellExperinment class, my code as follows: assay(umi.qc, "sctransform_counts") <- sctransform::vst(assay(umi.qc,"counts"), latent_var = c('log_umi'), do_regularize = TRUE, n_genes = NULL, return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE ) And an error came out: 图片

It seems like that my original matrix of assay(umi.qc,"counts") is 14066 X 657, 图片 but vst only use the 14063 genes to compute.

Do you know what happened? Does that mean vst will pre-check the imput data and omit some specific genes?

Looking forward your reply!

ChristophH commented 5 years ago

The vst function has a parameter min_cells set to 5 by default. This means that genes that are detected in fewer than 5 cells are not considered during normalization and are not part of the output.

Also, note that the output of vst is a list. The normalized expression matrix is in position 'y'.

FTD2018 commented 5 years ago

Thanks for your reply! I didn't notice the argument min_cells in vst function.

Actually this data is the iPSC data(Tung et al., 2017), and I compared the normalization results of scran with sctransform using the plotPCA and plotRLE function in scater R package. 图片 图片

And it seems like the scran performed better than sctransform for this data. I know that there isn't a perfect method for every datasets, but do you have any suggestions for comparing different normalization methods for certain datasets except the plotPCA and plotRLE. There is a R package called scone to address this issue, but it is a little complicated and I haven't tried yet.

Anyway, do you have any suggestions for this issue?

satijalab commented 5 years ago

While it is never easy to precisely (and accurately) benchmark single cell workflows, it is possible to do so when there is ground truth for the underlying cell states. Scone as you suggest is also a powerful tool for comparisons w.r.t. technical metrics.

However, a comparison of how different batches separate simply by looking at PC1 and PC2 is not, in my opinion, an effective benchmark.

Aaron Lun makes a compelling case for the inherent limitations of log-transformation (and pseudocount addition) here (we were unaware of this until after submission) https://www.biorxiv.org/content/biorxiv/early/2018/08/31/404962.full.pdf

Through estimating the regularized-NB model, sctransform omits the need for these steps. While we cannot guarantee improved performance on every single dataset, we do observe improved performance of sctransform over log-based normalization schemes (see examples in the manuscript and vignette).