privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Comparison of bigsnpr versions 1.11.4 and 1.10.8 #390

Closed nathangillespie closed 1 year ago

nathangillespie commented 1 year ago

Dear Authors, We are using LDpred2 in bigsnpr to estimate weights to generate polygenic risk scores. Here, we note the much lower than expected correlations between weights generated using different versions of bigsnpr across three servers (see below). The correlation between beta weights using bigsnpr version 1.11.4 and 1.10.8 is 0.55. What’s more, a PRS for education attainment using the EA4 sumstats and bigsnpr 1.11.4 is no longer predictive of EA in an independent sample, whereas the EA4 PRS when based on the older bigsnpr 1.10.8 weights remains significant. Is a correlation of 0.55 between output from adjacent versions reasonable? Regards, Nathan

package:       VCU       Dutch       TW
R 4.1.1       4.0.3       3.6.0
bigsnpr        1.11.4     1.11.4     1.10.8
bigreadr        0.2.4       0.2.4       0.2.4

VCU 1.11.4 & VCU 1.10.8
bigsnpr package r = 0.55

VCU 1.10.8 and TW 1.10.8 bigsnpr package r = 1.00

Dutch 1.10 and Dutch v1.11 bigsnpr package r= 0.55

privefl commented 1 year ago

I am a bit surprised that you get a 100% correlation by running LDpred2 twice. There is some random sampling, so unless you use seeds, two runs of LDpred2 should give slightly different results.

But indeed r=0.55 is quite small, and that might reveal some issue. I will have time to investigate this further next week. In the meantime, could you please report the range that you get with the two versions?

privefl commented 1 year ago

I've tried the EA4 sumstats (N=765K), and it seems the new model is better for it. At least range and predicted r2 from the Gibbs sampler are larger.

Maybe you could send me your code and I could try it.

nathangillespie commented 1 year ago

Here are the ranges: range

nathangillespie commented 1 year ago

Here is a link to the R script we have been using:

https://drive.google.com/file/d/1a1xglvaKH7d_x3A2LZft4RDL4odivvjo/view?usp=share_link

nathangillespie commented 1 year ago

You're correct. The correlations between VCU 1.10.8 and TW 1.10.8 bigsnpr package were not perfect but actually 0.99.

privefl commented 1 year ago

The ranges are what I get too. The estimated r2 is actually very large with the new version (19%) for me, so I'm quite surprised when you say that it is not predictive at all..

privefl commented 1 year ago

You're also missing the QC step on the sumstats.

privefl commented 1 year ago

If you install the newest GitHub version, you can now run LDpred2-auto as before when using use_MLE = FALSE.

privefl commented 1 year ago

Any update on this?