satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

When using batch_var, error 'undefined columns selected' in SCTModel_to_vst during FindTransferAnchors #105

Closed nimne closed 3 years ago

nimne commented 3 years ago

Hello,

I think I've encountered a bug that I can't quite figure out how to solve. I'm currently running Seurat 4.0.1 and SCTransform 0.3.2.9006 from the development branch. This error only occurs when setting batch_var.

sessionInfo()

``` R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] sctransform_0.3.2.9006 SeuratObject_4.0.0 Seurat_4.0.1 loaded via a namespace (and not attached): [1] nlme_3.1-152 matrixStats_0.58.0 spatstat.sparse_2.0-0 [4] RcppAnnoy_0.0.18 RColorBrewer_1.1-2 httr_1.4.2 [7] tools_4.0.2 utf8_1.2.1 R6_2.5.0 [10] irlba_2.3.3 rpart_4.1-15 KernSmooth_2.23-18 [13] uwot_0.1.10 mgcv_1.8-35 DBI_1.1.1 [16] lazyeval_0.2.2 colorspace_2.0-0 tidyselect_1.1.1 [19] gridExtra_2.3 compiler_4.0.2 plotly_4.9.3 [22] scales_1.1.1 lmtest_0.9-38 spatstat.data_2.1-0 [25] ggridges_0.5.3 pbapply_1.4-3 goftest_1.2-2 [28] stringr_1.4.0 digest_0.6.27 spatstat.utils_2.1-0 [31] pkgconfig_2.0.3 htmltools_0.5.1.1 parallelly_1.25.0 [34] fastmap_1.1.0 htmlwidgets_1.5.3 rlang_0.4.11 [37] shiny_1.6.0 generics_0.1.0 zoo_1.8-9 [40] jsonlite_1.7.2 ica_1.0-2 dplyr_1.0.5 [43] magrittr_2.0.1 patchwork_1.1.1 Matrix_1.3-2 [46] Rcpp_1.0.6 munsell_0.5.0 fansi_0.4.2 [49] abind_1.4-5 reticulate_1.20 lifecycle_1.0.0 [52] stringi_1.5.3 MASS_7.3-53.1 Rtsne_0.15 [55] plyr_1.8.6 grid_4.0.2 parallel_4.0.2 [58] listenv_0.8.0 promises_1.2.0.1 ggrepel_0.9.1 [61] crayon_1.4.1 deldir_0.2-10 miniUI_0.1.1.1 [64] lattice_0.20-44 cowplot_1.1.1 splines_4.0.2 [67] tensor_1.5 knitr_1.33 pillar_1.6.0 [70] igraph_1.2.6 spatstat.geom_2.1-0 future.apply_1.7.0 [73] reshape2_1.4.4 codetools_0.2-18 leiden_0.3.7 [76] glue_1.4.2 data.table_1.14.0 renv_0.13.2 [79] BiocManager_1.30.12 png_0.1-7 vctrs_0.3.8 [82] httpuv_1.6.0 polyclip_1.10-0 gtable_0.3.0 [85] RANN_2.6.1 purrr_0.3.4 spatstat.core_2.1-2 [88] tidyr_1.1.3 scattermore_0.7 future_1.21.0 [91] assertthat_0.2.1 ggplot2_3.3.3 xfun_0.22 [94] mime_0.10 xtable_1.8-4 later_1.2.0 [97] survival_3.2-11 viridisLite_0.4.0 tibble_3.1.1 [100] cluster_2.1.2 globals_0.14.0 fitdistrplus_1.1-3 [103] ellipsis_0.3.2 ROCR_1.0-11 ```

Here is a minimal example to reproduce the issue

library(Seurat)
library(sctransform)
data("pbmc_small")
pbmc_small <- SCTransform(object = pbmc_small, batch_var = "groups")
pbmc_small_list <- SplitObject(pbmc_small, split.by = "groups")
anchors <- FindTransferAnchors(reference = pbmc_small_list[[1]], query = pbmc_small_list[[2]])

this results in

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting 
a method for function 'as.matrix': undefined columns selected

The traceback indicates that there is a failure in the Seurat function SCTModel_to_vst here:

feature.params <- c("theta", "(Intercept)", "log_umi")
as.matrix(x = slot(object = SCTModel, name = "feature.attributes")[, feature.params])

Inspecting the SCTModel indicates that (Intercept) is not part of feature.attributes

> colnames(pbmc_small_list[[1]]@assays$SCT@SCTModel.list$model1@feature.attributes)
 [1] "detection_rate"         "gmean"                  "variance"               "residual_mean"          "residual_variance"     
 [6] "theta"                  "groupsg1"               "groupsg2"               "log_umi:groupsg1"       "log_umi:groupsg2"      
[11] "genes_log_gmean_step1"  "step1_theta"            "step1_groupsg1"         "step1_groupsg2"         "step1_log_umi:groupsg1"
[16] "step1_log_umi:groupsg2"
> "(Intercept)" %in% colnames(pbmc_small_list[[1]]@assays$SCT@SCTModel.list$model1@feature.attributes)
[1] FALSE

However, re-running without batch_var results in (Intercept) showing up and FindTransferAnchors runs as expected.

> pbmc_small_list[[1]] <- SCTransform(object = pbmc_small_list[[1]])
> colnames(pbmc_small_list[[1]]@assays$SCT@SCTModel.list$model1@feature.attributes)
 [1] "detection_rate"        "gmean"                 "variance"              "residual_mean"         "residual_variance"     "theta"                
 [7] "(Intercept)"           "log_umi"               "genes_log_gmean_step1" "step1_theta"           "step1_(Intercept)"     "step1_log_umi"        
> "(Intercept)" %in% colnames(pbmc_small_list[[1]]@assays$SCT@SCTModel.list$model1@feature.attributes)
[1] TRUE

This bug also occurs when setting method = 'glmGamPoi', although I have not tried other methods.

Thanks for all of your effort with this amazing package!

ChristophH commented 3 years ago

Hi,

Thank you for your detailed analysis of the problem. Indeed, FindTransferAnchors does not support sctransform models when batch_var is used. I suggest you first integrate the batches (see this vignette) and then call FindTransferAnchors on the integrated object as outlined in this vignette.

Alternatively, post this issue to the Seurat issue tracker

nimne commented 3 years ago

Thanks for the suggestion, that approach seems like a reasonable way to make this work!

I'll open an issue in the Seurat issue tracker - more testing indicates that this worked prior to v4, so at minimum a descriptive error message might help someone else.