immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
528 stars 99 forks source link

Convergence plot showing bizarre reading, data okay to use? #241

Open jcshuy opened 9 months ago

jcshuy commented 9 months ago

Hi, sorry if this was already addressed in a previous issue-- I was not able to source a solution to this. I ran Harmony on a relatively large (~21k cells) merged Seurat v4 object with the following command: seurObj <- RunHarmony(seurObj, 'orig.ident', plot_convergence = T, early_stop = F, max_iter = 40) and afterwards the convergence plot showed a result like this: image I also checked the UMAP and believe the two did not converge correctly. However, I am confused why the result turned out like this. I have used Harmony in the past and usually they will converge without issue if setting a larger max_iter value along with early_stop = F. Does anybody have a solution to remedy this?

Session info:


R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.42.0               reticulate_1.35.0           pheatmap_1.0.12            
 [4] reshape2_1.4.4              viridis_0.6.5               viridisLite_0.4.2          
 [7] topGO_2.54.0                SparseM_1.81                GO.db_3.18.0               
[10] graph_1.80.0                plyr_1.8.9                  biomaRt_2.58.2             
[13] magrittr_2.0.3              zinbwave_1.24.0             Matrix_1.6-5               
[16] edgeR_4.0.15                limma_3.58.1                cowplot_1.1.3              
[19] UpSetR_1.4.0                org.Hs.eg.db_3.18.0         org.Mm.eg.db_3.18.0        
[22] AnnotationDbi_1.64.1        clusterProfiler_4.10.0      randomcoloR_1.1.0.1        
[25] DropletUtils_1.22.0         patchwork_1.2.0             RColorBrewer_1.1-3         
[28] EnhancedVolcano_1.20.0      ggrepel_0.9.5               lubridate_1.9.3            
[31] forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                
[34] purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                
[37] tibble_3.2.1                tidyverse_2.0.0             SCpubr_2.0.2               
[40] harmony_1.2.0               Rcpp_1.0.12                 scRNAseq_2.16.0            
[43] scater_1.30.1               ggplot2_3.4.4               scran_1.30.2               
[46] scuttle_1.12.0              SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[49] Biobase_2.62.0              GenomicRanges_1.54.1        GenomeInfoDb_1.38.6        
[52] IRanges_2.36.0              S4Vectors_0.40.2            MatrixGenerics_1.14.0      
[55] matrixStats_1.2.0           BiocGenerics_0.48.1         SeuratObject_4.1.4         
[58] Seurat_4.4.0               

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             progress_1.2.3                goftest_1.2-3                
  [4] Biostrings_2.70.2             HDF5Array_1.30.0              vctrs_0.6.5                  
  [7] spatstat.random_3.2-2         digest_0.6.34                 png_0.1-8                    
 [10] deldir_2.0-2                  parallelly_1.36.0             MASS_7.3-60.0.1              
 [13] httpuv_1.6.14                 qvalue_2.34.0                 withr_3.0.0                  
 [16] ggrastr_1.0.2                 ggfun_0.1.4                   ellipsis_0.3.2               
 [19] survival_3.5-7                memoise_2.0.1                 ggbeeswarm_0.7.2             
 [22] gson_0.1.0                    tidytree_0.4.6                zoo_1.8-12                   
 [25] V8_4.4.1                      pbapply_1.7-2                 R.oo_1.26.0                  
 [28] prettyunits_1.2.0             KEGGREST_1.42.0               promises_1.2.1               
 [31] httr_1.4.7                    restfulr_0.0.15               globals_0.16.2               
 [34] fitdistrplus_1.1-11           rhdf5filters_1.14.1           rhdf5_2.46.1                 
 [37] rstudioapi_0.15.0             miniUI_0.1.1.1                generics_0.1.3               
 [40] DOSE_3.28.2                   curl_5.2.0                    zlibbioc_1.48.0              
 [43] ScaledMatrix_1.10.0           ggraph_2.1.0                  polyclip_1.10-6              
 [46] GenomeInfoDbData_1.2.11       ExperimentHub_2.10.0          SparseArray_1.2.4            
 [49] interactiveDisplayBase_1.40.0 xtable_1.8-4                  S4Arrays_1.2.0               
 [52] BiocFileCache_2.10.1          hms_1.1.3                     irlba_2.3.5.1                
 [55] colorspace_2.1-0              filelock_1.0.3                ROCR_1.0-11                  
 [58] spatstat.data_3.0-4           lmtest_0.9-40                 later_1.3.2                  
 [61] ggtree_3.10.0                 lattice_0.22-5                spatstat.geom_3.2-8          
 [64] future.apply_1.11.1           genefilter_1.84.0             scattermore_1.2              
 [67] XML_3.99-0.16.1               shadowtext_0.1.3              RcppAnnoy_0.0.22             
 [70] pillar_1.9.0                  nlme_3.1-164                  compiler_4.3.2               
 [73] beachmat_2.18.0               stringi_1.8.3                 tensor_1.5                   
 [76] GenomicAlignments_1.38.2      crayon_1.5.2                  abind_1.4-5                  
 [79] BiocIO_1.12.0                 gridGraphics_0.5-1            locfit_1.5-9.8               
 [82] sp_2.1-3                      graphlayouts_1.1.0            bit_4.0.5                    
 [85] fastmatch_1.1-4               codetools_0.2-19              BiocSingular_1.18.0          
 [88] plotly_4.10.4                 mime_0.12                     splines_4.3.2                
 [91] dbplyr_2.4.0                  sparseMatrixStats_1.14.0      HDO.db_0.99.1                
 [94] blob_1.2.4                    utf8_1.2.4                    BiocVersion_3.18.1           
 [97] AnnotationFilter_1.26.0       fs_1.6.3                      listenv_0.9.1                
[100] DelayedMatrixStats_1.24.0     ggplotify_0.1.2               statmod_1.5.0                
[103] tzdb_0.4.0                    tweenr_2.0.2                  pkgconfig_2.0.3              
[106] tools_4.3.2                   cachem_1.0.8                  RhpcBLASctl_0.23-42          
[109] RSQLite_2.3.5                 DBI_1.2.1                     fastmap_1.1.1                
[112] scales_1.3.0                  grid_4.3.2                    ica_1.0-3                    
[115] Rsamtools_2.18.0              AnnotationHub_3.10.0          BiocManager_1.30.22          
[118] RANN_2.6.1                    farver_2.1.1                  tidygraph_1.3.1              
[121] scatterpie_0.2.1              yaml_2.3.8                    rtracklayer_1.62.0           
[124] cli_3.6.2                     leiden_0.4.3.1                lifecycle_1.0.4              
[127] uwot_0.1.16                   bluster_1.12.0                BiocParallel_1.36.0          
[130] annotate_1.80.0               timechange_0.3.0              gtable_0.3.4                 
[133] rjson_0.2.21                  ggridges_0.5.6                progressr_0.14.0             
[136] parallel_4.3.2                ape_5.7-1                     softImpute_1.4-1             
[139] jsonlite_1.8.8                bitops_1.0-7                  bit64_4.0.5                  
[142] Rtsne_0.17                    yulab.utils_0.1.4             spatstat.utils_3.0-4         
[145] BiocNeighbors_1.20.2          metapod_1.10.1                GOSemSim_2.28.1              
[148] dqrng_0.3.2                   R.utils_2.12.3                lazyeval_0.2.2               
[151] shiny_1.8.0                   htmltools_0.5.7               enrichplot_1.22.0            
[154] sctransform_0.4.1             rappdirs_0.3.3                ensembldb_2.26.0             
[157] glue_1.7.0                    XVector_0.42.0                RCurl_1.98-1.14              
[160] treeio_1.26.0                 gridExtra_2.3                 igraph_2.0.1.1               
[163] R6_2.5.1                      labeling_0.4.3                GenomicFeatures_1.54.3       
[166] cluster_2.1.6                 Rhdf5lib_1.24.2               aplot_0.2.2                  
[169] DelayedArray_0.28.0           tidyselect_1.2.0              vipor_0.4.7                  
[172] ProtGenerics_1.34.0           ggforce_0.4.1                 xml2_1.3.6                   
[175] future_1.33.1                 rsvd_1.0.5                    munsell_0.5.0                
[178] KernSmooth_2.23-22            data.table_1.15.0             htmlwidgets_1.6.4            
[181] fgsea_1.28.0                  rlang_1.1.3                   spatstat.sparse_3.0-3        
[184] spatstat.explore_3.2-6        fansi_1.0.6                   beeswarm_0.4.0     ```
pati-ni commented 9 months ago

The convergence plot looks strange.

Can you share some extra information about the datasets (number of batches, differences expected between batches)? From the top of my head I would encourage you to set lambda=NULL to see whether this improves things.

Also, how many PCs are you using for this?

jcshuy commented 8 months ago

Sorry for the late response. To be honest I am still quite new to single cell analysis and I am not too familiar of how this dataset was processed; I believe it may be two batches (there are 2 control samples and two treated samples).

After QC I have run 75 PCs for this dataset seurObj <- NormalizeData(seurObj) %>% FindVariableFeatures(selection.method = "vst") %>% ScaleData(vars.to.regress = c('nCount_RNA','nFeature_RNA')) %>% RunPCA(npcs = 75, verbose = T) and the resulting PCA plot looks like this:

Screenshot 2024-02-21 at 9 18 05 PM

Running harmony with lambda set to NULL results in this following convergence plot:

Screenshot 2024-02-21 at 9 23 32 PM

It also appears that the harmony plot is unchanged from the original unintegrated plot.

Screenshot 2024-02-21 at 9 25 11 PM