privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
194 stars 44 forks source link

Question about LDpred2 #81

Closed YinLiLin closed 4 years ago

YinLiLin commented 4 years ago

Hi Florian,

I am working on running LDpred2 using bigsnpr, really fantastic package you created, which I enjoy greatly. I am not very familiar with the mathematical theory for LDpred, but I can successfully run different models with your fully guidance here, including LDpred_inf, LDpred_grid_nosp, LDpred_grid_sp, LDpred_grid_auto. I have few questions to consult, hope can get your professional response:

  1. Is LDpred_inf the same with SBLUP (implemented in GCTA) in theory?
  2. I have looked through your paper at bioRxiv, it seems there is no significant difference between LDpred_grid and LDpred_auto, and I find the default setting of MCMC iteration number is quite difference for them, so I am just wondering that LDpred_grid can converge faster than LDpred_auto? and how is difference on the computational time if we set the same MCMC iteration number? which one will you recommend for various genetic architecture traits in terms of prediction accuracy and efficiency?

Best regards, Lilin

bvilhjal commented 4 years ago

Hi Lilian,

Regarding 1., yes as I understand it it should be the same, modulo implementation.

Regarding 2., the difference is that the auto version infers the heritability and fractions of causal variants and therefore requires more iterations to converge. The grid just tests a grid of the same hyperparameters (i.e. heritability and fraction of causal variants).

Best, Bjarni

privefl commented 4 years ago

Following up, The problem with auto is that it can fail at converging and then it would provide worse predictive performance (see e.g. results for PRCA and T1D).

If you want to use LDpred2-auto, I would recommend to perform visual inspection of the paths of the estimated p & h2 parameters (as shown in the tuto) and also to look at the scaling of the predictions and compare them with the scale of predictions using LDpred2-inf (as shown in the tuto).

You can also run multiple chains (i.e. run LDpred2-auto with different p_init values) and see which one is converging best. Hopefully, we'll come up with an automatic solution to choose the best chain in the next version of the paper.

YinLiLin commented 4 years ago

@bvilhjal @privefl Thanks a lot for your detailed response. Looking forward to the new version will come to public soon.

privefl commented 4 years ago

A new version of the preprint is available.

YinLiLin commented 4 years ago

Many thanks, very nice to hear that. I am trying and will feed back here if there are further questions.

privefl commented 4 years ago

Thank you for using LDpred2. Please note that we now recommend running LDpred2 genome-wide instead of per chromosome. The paper (preprint) and tutorial have been updated.