parameter estimation took >3h to run #5

Closed WeiCSong closed 4 years ago

WeiCSong commented 4 years ago

Hi CAUSE developer, I ran CAUSE on two GWAS with ~1.1m SNPs in Hapmap3 on my laptop: params <- est_cause_params(X, X$snp) The program ran for more than 3 hours and returned 17 items (in your tutorial, only 4 items were returned).
Estimating CAUSE parameters with 1168986 variants. 1 0.1358739 2 0.003087264 3 0.000115005 4 0.0003352196 5 6.665314e-05 6 0.0005813261 7 0.0001788529 8 2.382199e-06 9 6.658286e-06 10 6.634983e-06 11 0.000180384 12 2.413624e-06 13 0.0001905234 14 2.45161e-06 15 1.014851e-05 16 0.0001803958 17 2.413677e-06

I guessed something went wrong and terminated the program. Could you tell me whether i should wait until it finishes? Thanks for your help.

jean997 commented 4 years ago

Hi! Can you send your sessionInfo(). In particular which version of mixsqp are you using?

WeiCSong commented 4 years ago

Thanks! I got mixsqp_0.3-43, and the complete session info is:

R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] cause_0.3.0.0254 dplyr_0.8.5 readr_1.3.1 MRPRESSO_1.0
[5] R.utils_2.9.2 R.oo_1.23.0 R.methodsS3_1.8.0 data.table_1.12.8 [9] mr.raps_0.2 TwoSampleMR_0.5.4

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 compiler_4.0.0 pillar_1.4.4
[4] iterators_1.0.12 gtable_0.3.0 lifecycle_0.2.0
[7] tibble_3.0.1 lattice_0.20-41 pkgconfig_2.0.3
[10] rlang_0.4.6 Matrix_1.2-18 foreach_1.5.0
[13] parallel_4.0.0 loo_2.2.0 gridExtra_2.3
[16] invgamma_1.1 vctrs_0.3.0 hms_0.5.3
[19] glmnet_4.0 grid_4.0.0 nortest_1.0-4
[22] tidyselect_1.1.0 glue_1.4.1 R6_2.4.1
[25] mixsqp_0.3-43 irlba_2.3.3 tidyr_1.1.0
[28] ggplot2_3.3.1 purrr_0.3.4 ashr_2.2-47
[31] magrittr_1.5 matrixStats_0.56.0 intervals_0.15.2
[34] scales_1.1.1 codetools_0.2-16 ellipsis_0.3.1
[37] assertthat_0.2.1 colorspace_1.4-1 shape_1.4.4
[40] numDeriv_2016.8-1.1 RcppParallel_5.0.1 munsell_0.5.0
[43] truncnorm_1.0-8 SQUAREM_2020.2 crayon_1.3.4

jean997 commented 4 years ago

Ok great. That is what I suspected. The current release of cause is only compatible with mixsqp-0.1-97 and ashr-2.2.43 but I've been working on some changes that I think should make it compatible with the newer version. There are two options:

  1. Use the current release version (v1.0.0) and older versions of mixsqp and ashr. To do this follow the instructions in the readme and use these commands
devtools::install_version("mixsqp", version = "0.1-97", repos = "")
devtools::install_version("ashr", version = "2.2-32", repos = "")

This is the most tested version.

  1. Alternatively you could try the recent update with the mixsqp and ashr package versions you already have. I am currently testing this version but so far I am getting the same answers as using the older version. To do this, leave your mixsqp and ashr versions as is and install the development version of cause using

    Right now it is on version

Either way let me know if it works. cause_params is definitely the slowest step but it usually only takes about ten to twenty minutes using a million SNPs which is the amount I recomend.

WeiCSong commented 4 years ago

Hi, I tried CAUSE version 1.0.0 with the previous version of mixsqp and ashr. Now the parameter estimation gave convergent result, and the running time was about 30 min. I guess it's fixed. Thanks!

whecrane commented 4 years ago

Hi Jean, I have the same issue when I used the CAUSE to calculate nuisance parameters. I followed the advice from you, here is my sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/local/anaconda3/lib/R/lib/ locale: [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ashr_2.2-32 mixsqp_0.1-97 cause_1.0.0 dplyr_0.8.5
[5] data.table_1.12.2 But I have got more than 11 items before I closed it and take more than an hour. Please give me some advice to solve it. Thanks a lot.

jean997 commented 4 years ago

Hmm. 11 seems high but not necessarily an error. In my analysis for the paper, about 3% of the data sets required more than 10 iterations. A couple of things:

whecrane commented 4 years ago

Thanks a lot. It worked well after I waiting for a long time, there are 18 number. Thank you for your help.