satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
208 stars 33 forks source link

arguments imply differing number of rows #91

Closed Biomiha closed 3 years ago

Biomiha commented 3 years ago

Hi there,

I'm quite intrigued by the sctransform::vst function. I've tried replicating the vignette on a sparse matrix vignette.

When I run

set.seed(44)
vst_out <- sctransform::vst(pbmc_data, latent_var = c("log_umi"), return_gene_attr = TRUE, 
                          return_cell_attr = TRUE, verbosity = 1)

I get the following error:

Calculating cell attributes from input UMI matrix: log_umi
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 2700

I get the same error if I try a different matrix, albeit with a different row count. I think it's a simple bug.

Many thanks for the package. Miha

ChristophH commented 3 years ago

Hi Miha, what exactly is pbmc_data? Please make sure this is a sparse matrix (inheriting from dgCMatrix) where the columns are the cells.

Biomiha commented 3 years ago

Hi Christoph,

Sorry, in my initial post I added a link to your vignette, which was somehow lost due to formatting reasons. I've edited my initial post now with the link. pbmc_data was downloaded from the link in the vignette (this one here) and passed to vst as a 'dgCMatrix'.

Thanks, Miha

ChristophH commented 3 years ago

How did you convert the 10x output in the pbcm_data file to a sparse matrix? If I use the Seurat function Read10X to do this everything works as expected

counts <- Seurat::Read10X(data.dir = '~/Downloads/filtered_gene_bc_matrices/hg19/')

set.seed(44)
vst_out <- sctransform::vst(umi = counts, latent_var = c("log_umi"), return_gene_attr = TRUE, 
                            return_cell_attr = TRUE, verbosity = 1)
Biomiha commented 3 years ago

Oh, that's odd. I see what you mean. I actually used DropletUtils::read10xCounts. I initially tried SCTransform on the Seurat object like in the main vignette (class(pbmc) in this case is "Seurat"):

pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)

Which worked just fine. I then only wanted to test the variance stabilising function on the dgCMatrix and used DropletUtils to read it in. TBH I forgot about the Seurat::Read10X function. I didn't think that sparse matrices would be different depending on how you read them in but I apologise for raising the issue.

Many thanks, Miha

ChristophH commented 3 years ago

DropletUtils::read10xCounts returns a SingleCellExperiment object. When I extract the count matrix and set the column (cell) names, everything works.

library('DropletUtils')
tmp <- read10xCounts(samples = '~/Downloads/filtered_gene_bc_matrices/hg19/')
counts <- assay(tmp, 1, withDimnames=TRUE)
colnames(counts) <- colData(tmp)[, 2]
set.seed(44)
vst_out <- sctransform::vst(umi = counts, latent_var = c("log_umi"), return_gene_attr = TRUE, 
                            return_cell_attr = TRUE, verbosity = 1)
Biomiha commented 3 years ago

Yes, I noticed colnames was the problem. Thank you so much!

Best wishes, Miha