Closed klprint closed 4 years ago
Hello, this is hard to debug without a reproducible example. If you could provide the count matrix and the meta data, I will look into this error. In the meantime you could normalize each batch separately and then merge the data, or consider running an integration analysis (e.g as outlined in this Seurat vignette.
Dear @ChristophH ,
I think it's reproducible with these two files and the following code:
library(sctransform)
counts = read.csv('sct-debug-counts.csv.gz', row.names=1)
meta = read.csv('sct-debug-obs.csv.gz', row.names=1)
mat = as.matrix(t(counts))
vst_out = vst(mat,
cell_attr=meta,
batch_var=NULL,
latent_var=c('log_umi', 'prep'),
residual_type='pearson',
return_cell_attr=T,
min_cells=5,
show_progress=T,
)
res_var = get_residual_var(vst_out, mat)
corrected = correct_counts(vst_out, mat)
res = vst_out$y
sct-debug-obs.csv.gz sct-debug-counts.csv.gz
Output is:
Calculating cell attributes for input UMI matrix
Variance stabilizing transformation of count matrix of size 1605 by 3744
Model formula is y ~ log_umi + prep
Get Negative Binomial regression parameters per gene
Using 1605 genes, 3744 cells
| | 0%
Warning message in theta.ml(y = y, mu = fit$fitted):
“iteration limit reached”
Warning message in theta.ml(y = y, mu = fit$fitted):
“iteration limit reached”
Warning message in theta.ml(y = y, mu = fit$fitted):
“iteration limit reached”
Warning message in theta.ml(y = y, mu = fit$fitted):
“iteration limit reached”
Warning message in theta.ml(y = y, mu = fit$fitted):
...
“iteration limit reached”
Warning message in theta.ml(y = y, mu = fit$fitted):
“iteration limit reached”
|======================================================================| 100%
Found 140 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 1605 genes
|======================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 55.74316 secs
Calculating variance for residuals of type pearson for 1605 genes
|======================================================================| 100%
Warning message in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
“argument is not numeric or logical: returning NA”
Warning message in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
“argument is not numeric or logical: returning NA”
Computing corrected UMI count matrix
| | 0%
Error in base::tcrossprod(x, y): non-conformable arguments
Traceback:
1. correct_counts(vst_out, mat)
2. tcrossprod(coefs, regressor_data)
3. tcrossprod(coefs, regressor_data)
4. base::tcrossprod(x, y)
@gokceneraslan Your prep
appears to be a batch variable? In which case it should be modeled as such:
vst_out = vst(mat,
cell_attr=meta,
latent_var=c('log_umi'),
batch_var=c('prep'),
residual_type='pearson',
return_cell_attr=T,
min_cells=5,
show_progress=T,
)
@saketkc It is not a batch variable but it's categorical. Do you mean we're not allowed to use categorical variables in the "latent_var" argument? If so, why is that? We can regress out any variable as long as it doesn't make the matrix singular, no? I thought batch_var just makes it an interaction term with latent_vars, so how come making it an interaction term solves the problem here?
@gokceneraslan Given your input data I can run the vst
command without errors.
plot_model_pars(vst_out) shows:
But the correct_counts
command fails
Computing corrected UMI count matrix
| | 0%
Error in base::tcrossprod(x, y) : non-conformable arguments
In addition: Warning messages:
1: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
argument is not numeric or logical: returning NA
2: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
Error in base::tcrossprod(x, y) : non-conformable arguments
4. base::tcrossprod(x, y)
3. tcrossprod(coefs, regressor_data)
2. tcrossprod(coefs, regressor_data) at denoise.R#176
1. correct_counts(vst_out, mat)
vst()
accepts categorical variables as latent_var
. Whether this makes sense, since the parameters are regularized (see plot above), is a different question and depends on what the variable encodes. There is also a latent_var_nonreg
parameter where regularization is not applied.
Note what correct_counts
does:
Correct data by setting all latent factors to their median values and reversing the regression model
This was designed with only continuous variables in mind. I will look into how I should handle cases with categorical variables.
For now, you could do this:
vst_out = vst(mat,
cell_attr=meta,
batch_var=NULL,
latent_var=c('log_umi'),
latent_var_nonreg =c('prep'),
residual_type='pearson',
return_cell_attr=T,
min_cells=5,
show_progress=T,
)
And correct_counts
should work.
Thanks so much @ChristophH ! Now I understand the reason behind the error. For now I think you can simply print an error if a categorical variable is passed to correct_counts, it's not a solution but it'd still be much better than the cryptic error message.
Hi! I have a dataset with three replicates and wanted to use scTransform. Unfortunately, as soon as I am adding the
batch_var
parameter, I get the error below, otherwise it is working, but still separates my batches on the UMAP.Any suggestions?
Output:
Traceback: