jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

parameter estimation took >3h to run #5

Closed WeiCSong closed 4 years ago

WeiCSong commented 4 years ago

Hi CAUSE developer, I ran CAUSE on two GWAS with ~1.1m SNPs in Hapmap3 on my laptop: params <- est_cause_params(X, X$snp) The program ran for more than 3 hours and returned 17 items (in your tutorial, only 4 items were returned).
Estimating CAUSE parameters with 1168986 variants. 1 0.1358739 2 0.003087264 3 0.000115005 4 0.0003352196 5 6.665314e-05 6 0.0005813261 7 0.0001788529 8 2.382199e-06 9 6.658286e-06 10 6.634983e-06 11 0.000180384 12 2.413624e-06 13 0.0001905234 14 2.45161e-06 15 1.014851e-05 16 0.0001803958 17 2.413677e-06

I guessed something went wrong and terminated the program. Could you tell me whether i should wait until it finishes? Thanks for your help.

jean997 commented 4 years ago

Hi! Can you send your sessionInfo(). In particular which version of mixsqp are you using?

WeiCSong commented 4 years ago

Hi! Can you send your sessionInfo(). In particular which version of mixsqp are you using?

Thanks! I got mixsqp_0.3-43, and the complete session info is:

R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] cause_0.3.0.0254 dplyr_0.8.5 readr_1.3.1 MRPRESSO_1.0
[5] R.utils_2.9.2 R.oo_1.23.0 R.methodsS3_1.8.0 data.table_1.12.8 [9] mr.raps_0.2 TwoSampleMR_0.5.4

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 compiler_4.0.0 pillar_1.4.4
[4] iterators_1.0.12 gtable_0.3.0 lifecycle_0.2.0
[7] tibble_3.0.1 lattice_0.20-41 pkgconfig_2.0.3
[10] rlang_0.4.6 Matrix_1.2-18 foreach_1.5.0
[13] parallel_4.0.0 loo_2.2.0 gridExtra_2.3
[16] invgamma_1.1 vctrs_0.3.0 hms_0.5.3
[19] glmnet_4.0 grid_4.0.0 nortest_1.0-4
[22] tidyselect_1.1.0 glue_1.4.1 R6_2.4.1
[25] mixsqp_0.3-43 irlba_2.3.3 tidyr_1.1.0
[28] ggplot2_3.3.1 purrr_0.3.4 ashr_2.2-47
[31] magrittr_1.5 matrixStats_0.56.0 intervals_0.15.2
[34] scales_1.1.1 codetools_0.2-16 ellipsis_0.3.1
[37] assertthat_0.2.1 colorspace_1.4-1 shape_1.4.4
[40] numDeriv_2016.8-1.1 RcppParallel_5.0.1 munsell_0.5.0
[43] truncnorm_1.0-8 SQUAREM_2020.2 crayon_1.3.4

jean997 commented 4 years ago

Ok great. That is what I suspected. The current release of cause is only compatible with mixsqp-0.1-97 and ashr-2.2.43 but I've been working on some changes that I think should make it compatible with the newer version. There are two options:

  1. Use the current release version (v1.0.0) and older versions of mixsqp and ashr. To do this follow the instructions in the readme and use these commands
devtools::install_github("jean997/cause@v1.0.0")
devtools::install_version("mixsqp", version = "0.1-97", repos = "http://cran.us.r-project.org")
devtools::install_version("ashr", version = "2.2-32", repos = "http://cran.us.r-project.org")

This is the most tested version.

  1. Alternatively you could try the recent update with the mixsqp and ashr package versions you already have. I am currently testing this version but so far I am getting the same answers as using the older version. To do this, leave your mixsqp and ashr versions as is and install the development version of cause using
    devtools::install_github("jean997/cause")

    Right now it is on version 1.0.0.0266.

Either way let me know if it works. cause_params is definitely the slowest step but it usually only takes about ten to twenty minutes using a million SNPs which is the amount I recomend.

WeiCSong commented 4 years ago

Hi, I tried CAUSE version 1.0.0 with the previous version of mixsqp and ashr. Now the parameter estimation gave convergent result, and the running time was about 30 min. I guess it's fixed. Thanks!

whecrane commented 4 years ago

Hi Jean, I have the same issue when I used the CAUSE to calculate nuisance parameters. I followed the advice from you, here is my sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/local/anaconda3/lib/R/lib/libRblas.so locale: [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ashr_2.2-32 mixsqp_0.1-97 cause_1.0.0 dplyr_0.8.5
[5] data.table_1.12.2 But I have got more than 11 items before I closed it and take more than an hour. Please give me some advice to solve it. Thanks a lot.

jean997 commented 4 years ago

Hmm. 11 seems high but not necessarily an error. In my analysis for the paper, about 3% of the data sets required more than 10 iterations. A couple of things:

whecrane commented 4 years ago

Thanks a lot. It worked well after I waiting for a long time, there are 18 number. Thank you for your help.