immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
513 stars 98 forks source link

harmony::RunHarmony v1.1.0 ignoring parameters #215

Closed jcshuy closed 10 months ago

jcshuy commented 10 months ago

Hi everyone, I'm new to single cell so I apologize if any of this is due to my experience. I had previously used a prior version of harmony without much trouble (unfortunately I don't remember which version it was), but since updating to 1.1.0 I've had some issues. When trying to integrate two small Seurat single-cell datasets (both are <10000 cells), I run the following command:

merged <- merge(seurObj1, seurObj2) # two raw data seurat objects
merged <- merged %>% NormalizeData() %>% FindVariableFeatures(selection.method = "vst") %>% ScaleData() %>% RunPCA(npcs = 100)
merged <- RunHarmony(merged, c("orig.ident"), plot_convergence = T)

But this stops at 2/10 harmony iterations. Looking at the convergence plot I inferred that they hadn't converged properly. image So I run the command for harmony_options(epsilon.cluster = -Inf, epsilon.harmony = -Inf) which I believe runs successfully, and change the last line of the above code as follows:

> harmony_options(epsilon.cluster = -Inf, epsilon.harmony = -Inf)
$lambda_range
[1]  0.1 10.0

$tau
[1] 0

$block.size
[1] 0.05

$max.iter.cluster
[1] 20

$epsilon.cluster
[1] -Inf

$epsilon.harmony
[1] -Inf

attr(,"class")
[1] "harmony_options"
merged <- RunHarmony(merged, c("orig.ident"), plot_convergence = T, max_iter = 20)

Theoretically, this would prevent the harmony process from ending early until it reaches 20 iterations, but even after adding those parameters RunHarmony still converges at 2/10 iterations. Changing the epsilon.cluster/epsilon.harmony values to anything else also yields to the same result. But this process returns the following and stops after 2/10 harmony iterations:

Transposing data matrix
Initializing state using k-means centroids initialization
Harmony 1/20
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 2/20
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony converged after 2 iterations
Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details on syntax validity

Is there a workaround for this problem? I know I saw another issue with ignored parameters but I believe it may already be covered with this 1.1.0 update.

sessionInfo():

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CellChat_1.6.1              igraph_1.5.1                DESeq2_1.40.2               reticulate_1.34.0           pheatmap_1.0.12             reshape2_1.4.4             
 [7] viridis_0.6.4               viridisLite_0.4.2           topGO_2.52.0                SparseM_1.81                GO.db_3.17.0                graph_1.78.0               
[13] plyr_1.8.9                  biomaRt_2.56.1              magrittr_2.0.3              zinbwave_1.22.0             Matrix_1.6-1.1              edgeR_3.42.4               
[19] limma_3.56.2                cowplot_1.1.1               UpSetR_1.4.0                org.Hs.eg.db_3.17.0         org.Mm.eg.db_3.17.0         AnnotationDbi_1.62.2       
[25] clusterProfiler_4.8.2       randomcoloR_1.1.0.1         DropletUtils_1.20.0         patchwork_1.1.3             RColorBrewer_1.1-3          EnhancedVolcano_1.18.0     
[31] ggrepel_0.9.4               lubridate_1.9.3             forcats_1.0.0               stringr_1.5.0               dplyr_1.1.3                 purrr_1.0.2                
[37] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                tidyverse_2.0.0             SCpubr_2.0.2                harmony_1.1.0              
[43] Rcpp_1.0.11                 scRNAseq_2.14.0             scater_1.28.0               ggplot2_3.4.4               scran_1.28.2                scuttle_1.9.4              
[49] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2 Biobase_2.60.0              GenomicRanges_1.52.1        GenomeInfoDb_1.36.4         IRanges_2.34.1             
[55] S4Vectors_0.38.2            MatrixGenerics_1.12.3       matrixStats_1.0.0           BiocGenerics_0.46.0         SeuratObject_4.1.4          Seurat_4.4.0               

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             progress_1.2.2                goftest_1.2-3                 Biostrings_2.68.1             HDF5Array_1.28.1              vctrs_0.6.4                  
  [7] spatstat.random_3.2-1         shape_1.4.6                   digest_0.6.33                 png_0.1-8                     registry_0.5-1                deldir_1.0-9                 
 [13] parallelly_1.36.0             MASS_7.3-60                   foreach_1.5.2                 httpuv_1.6.12                 qvalue_2.32.0                 withr_2.5.1                  
 [19] ggfun_0.1.3                   ggpubr_0.6.0                  ellipsis_0.3.2                survival_3.5-7                memoise_2.0.1                 ggbeeswarm_0.7.2             
 [25] gson_0.1.0                    systemfonts_1.0.5             GlobalOptions_0.1.2           tidytree_0.4.5                zoo_1.8-12                    V8_4.4.0                     
 [31] pbapply_1.7-2                 R.oo_1.25.0                   prettyunits_1.2.0             KEGGREST_1.40.1               promises_1.2.1                httr_1.4.7                   
 [37] downloader_0.4                rstatix_0.7.2                 restfulr_0.0.15               globals_0.16.2                fitdistrplus_1.1-11           rhdf5filters_1.12.1          
 [43] rhdf5_2.44.0                  rstudioapi_0.15.0             miniUI_0.1.1.1                generics_0.1.3                DOSE_3.26.1                   ggalluvial_0.12.5            
 [49] curl_5.1.0                    zlibbioc_1.46.0               ScaledMatrix_1.8.1            ggraph_2.1.0                  polyclip_1.10-6               GenomeInfoDbData_1.2.10      
 [55] ExperimentHub_2.8.1           interactiveDisplayBase_1.38.0 doParallel_1.0.17             xtable_1.8-4                  S4Arrays_1.0.6                BiocFileCache_2.8.0          
 [61] hms_1.1.3                     irlba_2.3.5.1                 colorspace_2.1-0              filelock_1.0.2                ggnetwork_0.5.12              ROCR_1.0-11                  
 [67] spatstat.data_3.0-3           lmtest_0.9-40                 later_1.3.1                   ggtree_3.8.2                  lattice_0.22-5                NMF_0.26                     
 [73] spatstat.geom_3.2-7           future.apply_1.11.0           genefilter_1.82.1             scattermore_1.2               XML_3.99-0.14                 shadowtext_0.1.2             
 [79] RcppAnnoy_0.0.21              pillar_1.9.0                  nlme_3.1-163                  sna_2.7-1                     iterators_1.0.14              gridBase_0.4-7               
 [85] compiler_4.3.1                beachmat_2.16.0               RSpectra_0.16-1               stringi_1.7.12                tensor_1.5                    GenomicAlignments_1.36.0     
 [91] crayon_1.5.2                  abind_1.4-5                   BiocIO_1.10.0                 gridGraphics_0.5-1            locfit_1.5-9.8                sp_2.1-1                     
 [97] graphlayouts_1.0.1            bit_4.0.5                     fastmatch_1.1-4               codetools_0.2-19              BiocSingular_1.16.0           GetoptLong_1.0.5             
[103] plotly_4.10.3                 mime_0.12                     splines_4.3.1                 circlize_0.4.16               dbplyr_2.3.4                  sparseMatrixStats_1.12.2     
[109] HDO.db_0.99.1                 blob_1.2.4                    utf8_1.2.4                    clue_0.3-65                   BiocVersion_3.17.1            AnnotationFilter_1.24.0      
[115] fs_1.6.3                      listenv_0.9.0                 DelayedMatrixStats_1.22.6     ggsignif_0.6.4                ggplotify_0.1.2               statmod_1.5.0                
[121] svglite_2.1.2                 tzdb_0.4.0                    network_1.18.1                tweenr_2.0.2                  pkgconfig_2.0.3               tools_4.3.1                  
[127] cachem_1.0.8                  RhpcBLASctl_0.23-42           RSQLite_2.3.1                 DBI_1.1.3                     fastmap_1.1.1                 scales_1.2.1                 
[133] grid_4.3.1                    ica_1.0-3                     Rsamtools_2.16.0              broom_1.0.5                   AnnotationHub_3.8.0           coda_0.19-4                  
[139] FNN_1.1.3.2                   BiocManager_1.30.22           carData_3.0-5                 RANN_2.6.1                    farver_2.1.1                  tidygraph_1.2.3              
[145] scatterpie_0.2.1              yaml_2.3.7                    rtracklayer_1.60.1            cli_3.6.1                     leiden_0.4.3                  lifecycle_1.0.3              
[151] uwot_0.1.16                   backports_1.4.1               bluster_1.10.0                BiocParallel_1.34.2           annotate_1.78.0               timechange_0.2.0             
[157] gtable_0.3.4                  rjson_0.2.21                  ggridges_0.5.4                progressr_0.14.0              parallel_4.3.1                ape_5.7-1                    
[163] softImpute_1.4-1              jsonlite_1.8.7                bitops_1.0-7                  bit64_4.0.5                   Rtsne_0.16                    yulab.utils_0.1.0            
[169] spatstat.utils_3.0-4          BiocNeighbors_1.18.0          metapod_1.8.0                 GOSemSim_2.26.1               dqrng_0.3.1                   R.utils_2.12.2               
[175] lazyeval_0.2.2                shiny_1.7.5.1                 htmltools_0.5.6.1             enrichplot_1.20.0             sctransform_0.4.1             rappdirs_0.3.3               
[181] ensembldb_2.24.1              glue_1.6.2                    XVector_0.40.0                RCurl_1.98-1.12               treeio_1.24.3                 gridExtra_2.3                
[187] R6_2.5.1                      labeling_0.4.3                GenomicFeatures_1.52.2        rngtools_1.5.2                cluster_2.1.4                 Rhdf5lib_1.22.1              
[193] aplot_0.2.2                   statnet.common_4.9.0          DelayedArray_0.26.7           tidyselect_1.2.0              vipor_0.4.5                   ProtGenerics_1.32.0          
[199] ggforce_0.4.1                 xml2_1.3.5                    car_3.1-2                     future_1.33.0                 rsvd_1.0.5                    munsell_0.5.0                
[205] KernSmooth_2.23-22            data.table_1.14.8             ComplexHeatmap_2.16.0         htmlwidgets_1.6.2             fgsea_1.26.0                  rlang_1.1.1                  
[211] spatstat.sparse_3.0-3         spatstat.explore_3.2-5        fansi_1.0.5                   beeswarm_0.4.0               
jcshuy commented 10 months ago

I didn't notice the added parameter of early_stop in RunHarmony. Changing to F fixes the issue.