satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
208 stars 33 forks source link

Error in handle_design_parameter: Number of rows in col_data does not match number of columns of data #111

Closed ghost closed 3 years ago

ghost commented 3 years ago

Using sctransform 0.3.2 and Seurat 4.0.4, I ran the following code: seur = SCTransform(seur, vars.to.regress = c("percent.mt"), conserve.memory = T, method = "glmGamPoi", verbose = T)

and got the following error:

Calculating cell attributes from input UMI matrix: log_umi Variance stabilizing transformation of count matrix of size 24670 by 226635 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 2000 genes, 5000 cells | |0% Error in handle_design_parameter(design, data, col_data, reference_level) : Number of rows in col_data does not match number of columns of data. Were there maybe 'NA's in the colData?

Here is some information about my Seurat object 'seur':

> dim(seur)
[1]  25610 226635
> sum(is.na(dimnames(seur)[[1]]))
[1] 0
> sum(is.na(dimnames(seur)[[2]]))
[1] 0

Both seur@assays$RNA@counts and seur@assays$RNA@data contain the same raw counts, and both are Matrix::dgCMatrix

Any help or guidance would be really appreciated!

ChristophH commented 3 years ago

The function glmGamPoi:::handle_design_parameter is having a problem - not sure why. Do your cells (columns in seur@assays$RNA@counts) all add up to more than zero? Perhaps make sure that there are at least N genes detected in every cell (with N = 300 or so).

If you can share the raw counts,I can also have a look.

ghost commented 3 years ago

Thanks for your quick response! After long debugging, it turns out that the error happened because colnames(seur) returned a vector of integers, instead of the cell names.

> head(colnames(object))
[1] "1" "2" "3" "4" "5" "6"

I then correctly set the column names of the Seurat object, and glmGamPoi ran just fine! > head(colnames(object)) [1] "CN1_AAACCTGAGCTATGCT-1" "CN1_AAACCTGAGCTGCAAG-1" "CN1_AAACCTGAGGCCGAAT-1" "CN1_AAACCTGCAAGAGGCT-1" "CN1_AAACCTGCAATACGCT-1" [6] "CN1_AAACCTGCACACTGCG-1"

In the SCTransform function, you could consider giving a warning when the colnames of the Seurat object consist only of integers.

The glmGamPoi package already has a more descriptive error message for this problem in the works, but I don't think it's part of the stable release yet. See issue

ChristophH commented 3 years ago

I cannot reproduce the problem on my end:

set.seed(42)
tmp_s <- CreateSeuratObject(counts = counts) %>% SCTransform(method = 'glmGamPoi')

and

colnames(counts) <- 1:ncol(counts)
set.seed(42)
tmp_s2 <- CreateSeuratObject(counts = counts) %>% SCTransform(method = 'glmGamPoi')

both finish and give identical results (scaled data matrix). Here counts is a count matrix that I happened to have loaded in my workspace.

So whatever was going on, I don't think it was a problem of having column (cell) names in the form of c('1', '2', '3', ...) in your input. Without a reproducible example we may never know...