satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 914 forks source link

Accounting for technical batches with SCTransform in Seurat #3657

Closed estevezb closed 4 years ago

estevezb commented 4 years ago

Need advice on how to let SCTransform know I have samples from different experimental batches. I have 12 libraries. 8 are from the first experiment and 4 are from the second experiment where cells were collected and libraries were on a different days. There are differences in sequencing depth and reads between these two batches. I know ChristophH and others have recommended to use the batch_var parameter passed to sctransform::vst via Seurat::SCTransform, but how does one (beginner on this stuff) actually do this?

Here my code:

Estevez_12_libraries_CD34cells_75_95_mito<- merge(NT_41pos_1_99052, y =c(NT_41pos_1_99056,NT_41pos_1_99060,NT_41pos_2_99052,NT_41pos_2_99056,NT_41pos_2_99060,shRX1_41pos_1_99054, shRX1_41pos_1_99058, shRX1_41pos_1_99062,shRX1_41pos_2_99054, shRX1_41pos_2_99058, shRX1_41pos_2_99062), add.cell.ids = c("NT_1_52","NT_1_56","NT_1_60", "NT_2_52", "NT_2_56", "NT_2_60", "RX_1_54", "RX_1_58", "RX_1_62", "RX_2_54", "RX_2_58", "RX_2_62"))

NOTE there are two batches for merged objects: BATCH1= NT_41pos_1_99052, NT_41pos_1_99056, NT_41pos_2_99052,NT_41pos_2_99056, shRX1_41pos_1_99054, shRX1_41pos_1_99058, shRX1_41pos_2_99054, shRX1_41pos_2_99058; BATCH2= NT_41pos_1_99060, NT_41pos_2_99060,shRX1_41pos_2_99062, shRX1_41_1_99062 )

The following analysis merges all libraries and does not account for technical differences between BATCH1 and BATCH2 , as I don't know how to at the moment.

Save merged object:

saveRDS(Estevez_12_libraries_CD34cells_75_95_mito, file = "Estevez_12_libraries_CD34cells_5_9mito.rds")

Store list of cell cycle genes for cell cycle regression

s.genes <- cc.genes$s.genes g2m.genes <- cc.genes$g2m.genes

blood.list <- SplitObject(Estevez_12_libraries_CD34cells_75_95_mito, split.by = "orig.ident")

# Run normalization cell cycle scoring and SCTransform:

for (i in 1:length(blood.list)){blood.list[[i]] <- NormalizeData(blood.list[[i]], verbose = TRUE) blood.list[[i]] <- CellCycleScoring(blood.list[[i]], g2m.features=g2m.genes, s.features= s.genes) blood.list[[i]] <- Seurat::SCTransform (blood.list[[i]], vars.to.regress= c("percent.mt", "S.Score", "G2M.Score"), batch_var= c("orig.ident") ) }

See below. I receive the following result in R ending with "...contrast can be applied onto factors with 2 or more levels" Performing log-normalization 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| Calculating cell attributes from input UMI matrix: log_umi Variance stabilizing transformation of count matrix of size 10878 by 1434 Model formula is y ~ (log_umi) : orig.ident + orig.ident + 0 Get Negative Binomial regression parameters per gene Using 2000 genes, 1434 cells | | 0%Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

SESSION INFO

sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.5 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] EnhancedVolcano_1.6.0 ggrepel_0.8.2 ggplot2_3.3.2 Seurat_3.2.2 dplyr_1.0.1

loaded via a namespace (and not attached): [1] Rtsne_0.15 colorspace_1.4-1 deldir_0.1-29 ellipsis_0.3.1 ggridges_0.5.2 rprojroot_1.3-2
[7] fs_1.5.0 rstudioapi_0.11 spatstat.data_1.4-3 leiden_0.3.3 listenv_0.8.0 farver_2.0.3
[13] remotes_2.2.0 RSpectra_0.16-0 fansi_0.4.1 codetools_0.2-16 splines_4.0.2 polyclip_1.10-0
[19] pkgload_1.1.0 jsonlite_1.7.1 ica_1.0-2 cluster_2.1.0 png_0.1-7 uwot_0.1.8
[25] shiny_1.5.0 sctransform_0.3.1 BiocManager_1.30.10 compiler_4.0.2 httr_1.4.2 backports_1.1.8
[31] assertthat_0.2.1 Matrix_1.2-18 fastmap_1.0.1 lazyeval_0.2.2 cli_2.0.2 later_1.1.0.1
[37] htmltools_0.5.0 prettyunits_1.1.1 tools_4.0.2 rsvd_1.0.3 igraph_1.2.5 gtable_0.3.0
[43] glue_1.4.1 RANN_2.6.1 reshape2_1.4.4 rappdirs_0.3.1 Rcpp_1.0.5 spatstat_1.64-1
[49] vctrs_0.3.2 nlme_3.1-147 lmtest_0.9-38 stringr_1.4.0 globals_0.12.5 ps_1.3.4
[55] testthat_2.3.2 mime_0.9 miniUI_0.1.1.1 lifecycle_0.2.0 irlba_2.3.3 devtools_2.3.2
[61] goftest_1.2-2 future_1.18.0 MASS_7.3-51.6 zoo_1.8-8 scales_1.1.1 promises_1.1.1
[67] spatstat.utils_1.17-0 parallel_4.0.2 RColorBrewer_1.1-2 memoise_1.1.0 reticulate_1.16 pbapply_1.4-3
[73] gridExtra_2.3 rpart_4.1-15 stringi_1.4.6 desc_1.2.0 pkgbuild_1.1.0 rlang_0.4.7
[79] pkgconfig_2.0.3 matrixStats_0.56.0 lattice_0.20-41 ROCR_1.0-11 purrr_0.3.4 tensor_1.5
[85] labeling_0.3 patchwork_1.0.1 htmlwidgets_1.5.1 cowplot_1.1.0 tidyselect_1.1.0 processx_3.4.4
[91] RcppAnnoy_0.0.16 plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2 pillar_1.4.6
[97] withr_2.2.0 mgcv_1.8-31 fitdistrplus_1.1-1 survival_3.1-12 abind_1.4-5 tibble_3.0.3
[103] future.apply_1.6.0 crayon_1.3.4 KernSmooth_2.23-17 plotly_4.9.2.1 usethis_1.6.3 grid_4.0.2
[109] data.table_1.13.0 callr_3.5.1 digest_0.6.25 xtable_1.8-4 tidyr_1.1.1 httpuv_1.5.4
[115] munsell_0.5.0 viridisLite_0.3.0 sessioninfo_1.1.1

estevezb commented 4 years ago

Here is a link to from a previous discussion by @luluyadummy for context on this issue. https://github.com/ChristophH/sctransform/issues/41#issuecomment-540106320. As I mentioned. It is still not clear for me as a beginner how to implement this on my code shown above.

andrewwbutler commented 4 years ago

Hi,

It looks like you've split the object by batch (so each item in the list only has one value for orig.ident). That's why you're seeing the error

...contrast can be applied onto factors with 2 or more levels"

If you want to use SCTransform normalization with the batch_var parameter, you shouldn't be splitting the object first.

estevezb commented 4 years ago

Makes sense. Thanks so much Andrew.

On Fri, Oct 30, 2020 at 1:50 PM Andrew Butler notifications@github.com wrote:

Closed #3657 https://github.com/satijalab/seurat/issues/3657.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/satijalab/seurat/issues/3657#event-3942436136, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANSZA5OFVEEFCFSBSLDYOYLSNL4EZANCNFSM4TCQCYRQ .