satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
206 stars 33 forks source link

No residual_variance with GlmGamPoi when using batch_var #74

Closed ATpoint closed 3 years ago

ATpoint commented 3 years ago

Hi Christoph,

running sctransform:.vst() with method glmGamPoi produces only NAs for residual_mean and residual_variance in case that cattr and batch_var were specified.

sctransform::vst(umi = assays(s)[["counts"]], 
                            latent_var = c('log_umi'), 
                            batch_var = c('Batch'),
                            cell_attr = cattr,
                            return_gene_attr = TRUE, 
                            return_cell_attr = TRUE, 
                            method = "glmGamPoi",
                            verbosity = TRUE)

It is a 54295 x 10116 experiment with two batches. Log and sessionInfo below, I do not have a second dataset here on my machine right now, but this seems to be specific when using the glmGamPoi method, all other methods produced the resudial_variance column as expected. Can you comment on this, something I do wrong on my end?

Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 21017 by 10116
Model formula is y ~ (log_umi) : Batch + Batch + 0
Get Negative Binomial regression parameters per gene
Using 2000 genes, 10116 cells
There are 224 estimated thetas smaller than 1e-07 - will be set to 1e-07
Found 148 outliers - those will be ignored in fitting/regularization step

Some genes not detected in batch WT_rep1 -- assuming a low mean.
Some genes not detected in batch WT_rep2 -- assuming a low mean.
Second step: Get residuals using fitted parameters for 21017 genes
Calculating gene attributes
Wall clock passed: Time difference of 1.962697 mins
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_3.3.2               SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2
 [4] DelayedArray_0.14.1         matrixStats_0.57.0          Biobase_2.48.0             
 [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
[10] S4Vectors_0.26.1            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0          xfun_0.18                 purrr_0.3.4               reshape2_1.4.4           
 [5] listenv_0.8.0             splines_4.0.2             lattice_0.20-41           colorspace_1.4-1         
 [9] vctrs_0.3.4               generics_0.0.2            rlang_0.4.7               pillar_1.4.6             
[13] withr_2.3.0               glue_1.4.2                GenomeInfoDbData_1.2.3    lifecycle_0.2.0          
[17] plyr_1.8.6                stringr_1.4.0             zlibbioc_1.34.0           munsell_0.5.0            
[21] gtable_0.3.0              future_1.19.1             codetools_0.2-16          labeling_0.3             
[25] knitr_1.30                Rcpp_1.0.5                scales_1.1.1              backports_1.1.10         
[29] XVector_0.28.0            gridExtra_2.3             digest_0.6.25             stringi_1.5.3            
[33] dplyr_1.0.2               sctransform_0.3.1         glmGamPoi_1.1.13          grid_4.0.2               
[37] rprojroot_1.3-2           tools_4.0.2               bitops_1.0-6              magrittr_1.5             
[41] RCurl_1.98-1.2            tibble_3.0.3              crayon_1.3.4              future.apply_1.6.0       
[45] pkgconfig_2.0.3           ellipsis_0.3.1            MASS_7.3-53               Matrix_1.2-18            
[49] DelayedMatrixStats_1.10.1 rstudioapi_0.11           R6_2.4.1                  globals_0.13.0           
[53] compiler_4.0.2        
ChristophH commented 3 years ago

Hi Alexander Thank you posting this bug. There was indeed a problem with how the coefficients were named when using a batch indicator and the glmGamPoi method. I have fixed this in the develop branch, which you can install via remotes::install_github("ChristophH/sctransform@develop")

ATpoint commented 3 years ago

Can confirm that this works just fine, thank you for the prompt reply. Might I suggest that you add a note in the README until the next release since I know at least one other person stumbling over this.

ATpoint commented 3 years ago

@ChristophH After loading v0.3.2 from CRAN, exact same code as above I see:

Calculating cell attributes from input UMI matrix: log_umi
Error in is.nan(rel_attr) : 
  default method not implemented for type 'list'

Works fine when going back toremotes::install_github("ChristophH/sctransform@develop", ref = "af8ba07"). Any idea what this is?

ATpoint commented 3 years ago

Ah, sorry did not realize this was already pointed out in https://github.com/ChristophH/sctransform/issues/88