privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

MCMCchain not converging #433

Closed xuxwen closed 10 months ago

xuxwen commented 1 year ago

Hi Florian! I was surprised to find that MCMC chains converged strangely when I used LD_pred2_auto for PRS. Our code is set up as follows “multi_auto <- try(snp_ldpred2_auto( corr, df_beta, h2_init = h2_est, vec_p_init = seq_log(1e-4, 0.2, length.out = 30), ncores = NCORES, use_MLE = FALSE, # uncomment if you have convergence issues (need v1.11.9) allow_jump_sign = FALSE, shrink_corr = 0.95,num_iter = 500),silent=T)” First we used pQTL data containing three hundred and fifty thousand SNPs,and the number of iterations was chosen to be 1000. The MCMC heritability convergence plots are shown below respectively d390808a5bccf7108ee2301ee26f99c,This is the convergence plot of the heritability of our randomly selected thirtieth chain。 178831aa461f43a30af77d3ba014c08,This is a randomly selected five MCMC chain heritability convergence plot, and was surprised to find that basically all chains are so strange.

Second, we used mQTL data containing 2,000,000 SNPs, and the number of iterations was also chosen to be 1000. d6d4990c38b0f28f3b715efccde6a5d It is difficult to converge the same for a randomly selected number of chains. How to solve this situation?

privefl commented 1 year ago

Are there very large effects in your sumstats? r2 <- with(sumstats, beta^2 / (beta^2 + n_eff * beta_se^2))

And what is the sample size associated with the sumstats?

xuxwen commented 1 year ago

The number of proteins is 4907, an Icelandic database of data QTL; The number of metabolites is more than four hundred, less than five hundred.

xuxwen commented 1 year ago

We wonder if the number of SNPs is so large that the Markov chain is difficult to converge, and we set the number of iterations to converge to 1000. The reason for this suspicion is that our methylation and expression QTL data have an average number of several hundred methylation SNPs each and only over 10,000 expressions, so we are now testing increasing the number of iterations to see if this will alleviate the situation.

xuxwen commented 1 year ago

The protein QTL data is a summary of the whole set of data, the total number of SNPs is eight million, but we excluded the SNPs with P>0.05, and the final number of SNPs for analysis is about 500,000.

privefl commented 1 year ago

What is the sample size (number of individuals)?

Are there large effects?

You should not filter on p-values, since this will lead to overestimating effect sizes (due to winner's curse). Or at least you need to use snp_thr_correct().

xuxwen commented 1 year ago

The individual protein is about thirty-five thousand, we are now trying Is it because the number of iterations is too small, whether there is a larger amount of effect, followed by the next I will check, thank you for your answer, what progress will be promptly replied to you!

privefl commented 10 months ago

Any update on this?