immunogenomics / harmony

Fast, sensitive and accurate integration of single-cell data with Harmony
https://portals.broadinstitute.org/harmony/
Other
513 stars 98 forks source link

RunHarmony on the weights branch errors with incompatible matrix dimensions #219

Open deevdevil88 opened 10 months ago

deevdevil88 commented 10 months ago

Hi, I am trying to use the weighted harmony as implemented in this paper DOI: 10.1016/j.medj.2022.05.002 , so i was using the weights branch. I installed the weights branch successfully and can run the `RunHarmony' function on a seurat object. However, after the first iteration i get the following error

the command im running, after insalling the weights branch is seu_object <- harmony::RunHarmony(object = seu_object,group.by.vars = c("cart_id","tissue_simple"), reduction = "weighted_pca",dims.use = 1:50,tau = 0,max.iter.harmony = 10, weight.by = "tissue_simple", reduction.save = "weighed_pca_harmony")

Screenshot 2023-11-08 at 19 25 38

on traceback of the error it seems to come from the Harmony function within. Screenshot 2023-11-08 at 19 31 01

the error persists with multiple parameter changes, includng using normal or wieghted PCA, so i dont think i am using the wrong covariates. Any help on this issue would be much appreciated. I tried to use the code that was on the github repo for the paper here : https://github.com/immunogenomics/FibroblastAtlas2022/blob/Med/Analyses_human/IntegratedClusters.ipynb , but i think the weighted harmony code on here isnt the easiset to implement.

best, Devika

PietroAndrei commented 9 months ago

Hello,

I am also having the same issue with Harmony when I try to integrate different Seurat objects through the Seurat v5 IntegrateLayers() function:


seurat[["RNA"]] = split(seurat[["RNA"]],f=seurat$slide)
seurat = seurat %>% NormalizeData() %>% 
FindVariableFeatures(nfeatures=1000) %>% 
ScaleData() %>% 
RunPCA(npcs=30)

seurat = IntegrateLayers(seurat,method=HarmonyIntegration,orig.reduction='pca',new.reduction='harmony')

Harmony 1/10
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error: element-wise pow(): incompatible matrix dimensions: 100x2 and 2x1

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

...
...
...
...

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] harmony_1.0        Rcpp_1.0.11        dplyr_1.1.3        Seurat_5.0.0      
[5] SeuratObject_5.0.0 sp_2.0-0          
pati-ni commented 9 months ago

Hi @PietroAndrei,

Two things.

PietroAndrei commented 9 months ago

Hi @pati-ni,

I solved the issue by upgrading Harmony from v1.0 to v1.2, thanks!

It may have been a compatibility problem with Rcpp, as I tried to use another dataset, and Harmony v.1.0 worked only when run within an environment with Rcpp < 1.0.10.

Thanks again for your help,

Pietro

jjs847 commented 7 months ago

Hi @pati-ni, could you re-open this issue as I am also having the same error on the weights branch when trying to run HarmonyMatrix with weights.

pastedImage

I have tried suggestions from other posts with a similar error (but not pertaining to the weights branch) such as downgrading Rcpp, converting the vars_use to factors or characters, and making sure there are no empty factor levels/NAs and am still getting the same error. Updating to the harmony version as you suggested doesn't work since I need the HarmonyMatrix function with the weights argument from the weights branch.

pati-ni commented 7 months ago

Hi @jjs847,

does your code run when you install the normal branch? I did not know about the weights branch

jjs847 commented 7 months ago

Hi @pati-ni,

Yes, it runs to completion with harmony 1.0

pati-ni commented 7 months ago

Hi @jjs847,

I have looked into the issue and I think I have addressed it. However, I can not reproduce the error with the existing weights branch due to the ambiguity of the pow call which won't let me compile.

Can you please install it from my branch to see if the issue is fixed?

devtools::install_github("pati-ni/harmony", ref="weights")

deevdevil88 commented 7 months ago

hi @pati-ni does this mean, the weights branch now works on your harmony repo and i can test it out again as well?

pati-ni commented 7 months ago

I would like you to help me test it out. I believe i fixed the error it was complaining about. This branch is not maintained at the moment and it was affected by a bug that was fixed in the master branch a year ago.

On Fri, Feb 9, 2024, 05:52 Devika Agarwal @.***> wrote:

hi @pati-ni https://github.com/pati-ni does this mean, the weights branch now works on your harmony repo and i can test it out again as well?

— Reply to this email directly, view it on GitHub https://github.com/immunogenomics/harmony/issues/219#issuecomment-1935705377, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSFW2GTRLRBR4FVRQOET6LYSX5WBAVCNFSM6AAAAAA7EMAVPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVG4YDKMZXG4 . You are receiving this because you were mentioned.Message ID: @.***>

jjs847 commented 7 months ago

Hi @pati-ni,

sorry for the delayed response -- it seems like its working now, and I am not getting the incompatible matrix error, however, its running extremely slow. So far it's been running for 5 hours and it's only on the second harmony iteration. Previously, when I ran HarmonyMatrix from harmony 1.0 (just to test if I got a matrix error), it took less than 2 minutes to complete. Is the reason its taking so much longer now because of the weights? My data is not that large, around 25k cells.

Thanks for working on this.

pati-ni commented 7 months ago

That is unreasonable slow. How many batches do you have?

So it is confusing the version was bumped to v1 accidentally on that branch. When you say that you run the branch in 2 mins, which branch do you refer to? Our current version is v1.2 and is significantly different than the weights branch (performance-wise).

Can you please provide some sessionInfo() outputs of the different harmony environments you are running the code on and some extra information for the dataset?

pati-ni commented 7 months ago

hi @pati-ni does this mean, the weights branch now works on your harmony repo and i can test it out again as well?

@deevdevil88 yes, please go ahead! Although, keep in mind that @jjs847 is facing slow runtimes, which is puzzling since the code is very similar. Also, it would be good if we had your feedback whether this runs at your system

jjs847 commented 7 months ago

Hi @pati-ni, here is the session info for the two versions:

Harmony 1.2.0, took less than 1 minute to run

Screen Shot 2024-02-11 at 2 38 00 PM

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] harmony_1.2.0 Rcpp_1.0.12 ggplot2_3.4.4
[4] Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-3
[7] singlecellmethods_0.1.0 data.table_1.15.0 dplyr_1.1.4

loaded via a namespace (and not attached): [1] Rtsne_0.17 colorspace_2.1-0 deldir_2.0-2
[4] ellipsis_0.3.2 ggridges_0.5.6 RcppHNSW_0.5.0
[7] rstudioapi_0.15.0 spatstat.data_3.0-4 leiden_0.4.3.1
[10] listenv_0.9.1 ggrepel_0.9.5 RSpectra_0.16-1
[13] fansi_1.0.6 codetools_0.2-19 splines_4.2.2
[16] knitr_1.45 polyclip_1.10-6 spam_2.10-0
[19] jsonlite_1.8.8 ica_1.0-3 cluster_2.1.6
[22] png_0.1-8 uwot_0.1.16 shiny_1.8.0
[25] sctransform_0.4.1 spatstat.sparse_3.0-3 compiler_4.2.2
[28] httr_1.4.7 Matrix_1.6-3 fastmap_1.1.1
[31] lazyeval_0.2.2 cli_3.6.2 later_1.3.2
[34] htmltools_0.5.7 tools_4.2.2 igraph_1.6.0
[37] dotCall64_1.1-1 gtable_0.3.4 glue_1.7.0
[40] RANN_2.6.1 reshape2_1.4.4 scattermore_1.2
[43] vctrs_0.6.5 spatstat.explore_3.2-5 nlme_3.1-164
[46] progressr_0.14.0 lmtest_0.9-40 spatstat.random_3.2-2 [49] xfun_0.41 stringr_1.5.1 globals_0.16.2
[52] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.4
[55] irlba_2.3.5.1 goftest_1.2-3 future_1.33.1
[58] MASS_7.3-60.0.1 zoo_1.8-12 scales_1.3.0
[61] promises_1.2.1 spatstat.utils_3.0-4 RColorBrewer_1.1-3
[64] yaml_2.3.8 reticulate_1.34.0 pbapply_1.7-2
[67] gridExtra_2.3 stringi_1.8.3 fastDummies_1.7.3
[70] rlang_1.1.3 pkgconfig_2.0.3 matrixStats_1.1.0
[73] evaluate_0.23 lattice_0.22-5 ROCR_1.0-11
[76] purrr_1.0.2 tensor_1.5 patchwork_1.2.0
[79] htmlwidgets_1.6.4 cowplot_1.1.3 tidyselect_1.2.0
[82] parallelly_1.36.0 RcppAnnoy_0.0.22 plyr_1.8.9
[85] magrittr_2.0.3 R6_2.5.1 generics_0.1.3
[88] withr_3.0.0 pillar_1.9.0 fitdistrplus_1.1-11
[91] survival_3.5-7 abind_1.4-5 tibble_3.2.1
[94] future.apply_1.11.1 crayon_1.5.2 KernSmooth_2.23-22
[97] utf8_1.2.4 spatstat.geom_3.2-8 plotly_4.10.4
[100] rmarkdown_2.25 grid_4.2.2 digest_0.6.34
[103] xtable_1.8-4 tidyr_1.3.1 httpuv_1.6.14
[106] munsell_0.5.0 viridisLite_0.4.2

Harmony pat-ni/weights, canceled after running for 12 hours, got to 4th iteration but did not finish (screen shot not from that run but just to show arguments used in function call)

Screen Shot 2024-02-11 at 2 40 48 PM

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8
[2] LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8
[8] LC_NAME=C
[9] LC_ADDRESS=C
[10] LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 [12] LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices [5] utils datasets methods base

other attached packages: [1] harmony_1.0
[2] Rcpp_1.0.12
[3] ggplot2_3.4.4
[4] Seurat_5.0.1
[5] SeuratObject_5.0.1
[6] sp_2.1-3
[7] singlecellmethods_0.1.0 [8] data.table_1.15.0
[9] dplyr_1.1.4

loaded via a namespace (and not attached): [1] Rtsne_0.17
[2] colorspace_2.1-0
[3] deldir_2.0-2
[4] ellipsis_0.3.2
[5] ggridges_0.5.6
[6] RcppHNSW_0.5.0
[7] fs_1.6.3
[8] spatstat.data_3.0-4
[9] rstudioapi_0.15.0
[10] leiden_0.4.3.1
[11] listenv_0.9.1
[12] remotes_2.4.2.1
[13] ggrepel_0.9.5
[14] RSpectra_0.16-1
[15] fansi_1.0.6
[16] codetools_0.2-19
[17] splines_4.2.2
[18] cachem_1.0.8
[19] knitr_1.45
[20] polyclip_1.10-6
[21] pkgload_1.3.4
[22] spam_2.10-0
[23] jsonlite_1.8.8
[24] ica_1.0-3
[25] cluster_2.1.6
[26] png_0.1-8
[27] uwot_0.1.16
[28] spatstat.sparse_3.0-3 [29] shiny_1.8.0
[30] sctransform_0.4.1
[31] compiler_4.2.2
[32] httr_1.4.7
[33] Matrix_1.6-3
[34] fastmap_1.1.1
[35] lazyeval_0.2.2
[36] cli_3.6.2
[37] later_1.3.2
[38] htmltools_0.5.7
[39] tools_4.2.2
[40] igraph_1.6.0
[41] dotCall64_1.1-1
[42] gtable_0.3.4
[43] glue_1.7.0
[44] reshape2_1.4.4
[45] RANN_2.6.1
[46] scattermore_1.2
[47] vctrs_0.6.5
[48] nlme_3.1-164
[49] spatstat.explore_3.2-5 [50] progressr_0.14.0
[51] lmtest_0.9-40
[52] spatstat.random_3.2-2 [53] xfun_0.41
[54] stringr_1.5.1
[55] globals_0.16.2
[56] mime_0.12
[57] miniUI_0.1.1.1
[58] lifecycle_1.0.4
[59] irlba_2.3.5.1
[60] devtools_2.4.5
[61] goftest_1.2-3
[62] future_1.33.1
[63] MASS_7.3-60.0.1
[64] zoo_1.8-12
[65] scales_1.3.0
[66] spatstat.utils_3.0-4
[67] promises_1.2.1
[68] RColorBrewer_1.1-3
[69] curl_5.2.0
[70] yaml_2.3.8
[71] gridExtra_2.3
[72] memoise_2.0.1
[73] reticulate_1.34.0
[74] pbapply_1.7-2
[75] stringi_1.8.3
[76] fastDummies_1.7.3
[77] pkgbuild_1.4.3
[78] rlang_1.1.3
[79] pkgconfig_2.0.3
[80] matrixStats_1.1.0
[81] evaluate_0.23
[82] lattice_0.22-5
[83] tensor_1.5
[84] ROCR_1.0-11
[85] purrr_1.0.2
[86] patchwork_1.2.0
[87] htmlwidgets_1.6.4
[88] cowplot_1.1.3
[89] tidyselect_1.2.0
[90] parallelly_1.36.0
[91] RcppAnnoy_0.0.22
[92] plyr_1.8.9
[93] magrittr_2.0.3
[94] R6_2.5.1
[95] generics_0.1.3
[96] profvis_0.3.8
[97] withr_3.0.0
[98] pillar_1.9.0
[99] fitdistrplus_1.1-11
[100] abind_1.4-5
[101] survival_3.5-7
[102] tibble_3.2.1
[103] future.apply_1.11.1
[104] crayon_1.5.2
[105] KernSmooth_2.23-22
[106] utf8_1.2.4
[107] spatstat.geom_3.2-8
[108] plotly_4.10.4
[109] rmarkdown_2.25
[110] urlchecker_1.0.1
[111] usethis_2.2.2
[112] grid_4.2.2
[113] digest_0.6.34
[114] xtable_1.8-4
[115] tidyr_1.3.1
[116] httpuv_1.6.14
[117] munsell_0.5.0
[118] viridisLite_0.4.2
[119] sessioninfo_1.2.2

Thanks for your help with this!

jjs847 commented 7 months ago

@pati-ni just wondering if there are any updates?

pati-ni commented 6 months ago

Hi @jjs847,

yes, sorry for the delay. I am trying to put together a dataset to test with the code.

Can you suggest some values for the weights? Do you set them within [0,1] range and the same value for the branch?

Also, I can see that you use the netlib blas/lapack. While I don't think this explains the slowness, it would be worth trying the OpenBlas version. If you install R from conda, then it comes with openblas. This makes a substantial difference performance-wise with this old version of harmony. However, before spawning R I would set export OMP_NUM_THREADS=1 to prevent OPENBLAS hogging all your CPU cores

Please let me know the necessary information so I can provide further guidance.

jjs847 commented 6 months ago

Hi @pati-ni,

no worries, thanks for your continued work on this!

I have a dataset with about 25k single-cell + single-nuclear data. My weights argument is a vector with an entry for each barcode of 0.53 for the single-cell data, and 8.56 for the single-nuclear data. Given your suggestion, I tried to re-run with weights between 0 and 1, and still only getting to the 4th iteration after 12 hours.

I will see if your second suggestion is possible on my HPC.