privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Restrict to HapMap3(+) variants / LD reference #378

Closed Smarzels closed 1 year ago

Smarzels commented 1 year ago

I'm currently trying to compute a polygenic score for BMI in a sample of N=~1,700 using the new LDpred2-auto (bigsnpr 1.11.4). Unfortunately, until now I've not been successful and I'm achieving much lower R2 than with LDpred1.

I have two short questions. First, I was wondering if there are instances where you don't restrict to HapMap3(+) variants. When we computed the PGS for BMI with LDpred1, we included 1,967,822 SNPs, but when I'm restricting to HapMap3 or HapMap3+ using LDpred2, we can include only 905,721 and 920,317 SNPs, respectively. Would it be advisable to use all SNPs (i.e., not restrict to HapMap(+) variants)?

Second, what is the difference between using 1000 Genomes Project data versus HapMap genetic maps to interpolate to genetic positions? And after converting to genetic positions, in which instances would you use your own data as LD reference and when would you use an alternative LD reference?

Thank you!

privefl commented 1 year ago

So, you're comparing the new LDpred2-auto to LDpred1-inf? Which sumstats are you using? Is it from the 1700 indivs?

You can try with the 2M variants, but you'll have to compute the LD yourself, and maybe try to make LD blocks as well.

For the conversion to genetic positions, it should not change much I think.

About the LD, you should have a look at this new section of the tutorial: https://privefl.github.io/bigsnpr/articles/LDpred2.html#which-set-of-variants-to-use.

Smarzels commented 1 year ago

Yes, I'm comparing performance of LDpred2-auto to LDpred1-inf. I'm using BMI sumstats from the GIANT consortium (~700,000 individuals).

Thank you for your suggestion, I will try with all the SNPs and compute the LD myself. Is there an example script available that uses own data for the LD calculation with LD blocks?

privefl commented 1 year ago

There is an example in the tutorial itself

Smarzels commented 1 year ago

Thanks! Including all SNPs and computing LD myself did not improve the PGS R2. I'm now trying LDpred2-auto, restricting to HapMap3+ variants and using the LD ref for HM3+. I'm using the code from your tutorial as an example (https://github.com/privefl/paper-ldpred2/blob/master/code/example-with-provided-ldref.R).

However, when I run lines 52-70 I get the following error: Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 'NA' indices are not (yet?) supported for sparse Matrices

Any idea what's going wrong?

privefl commented 1 year ago

Try this code instead: https://github.com/privefl/paper-infer/blob/main/code/example-with-provided-LD.R

Smarzels commented 1 year ago

Same problem, unfortunately. Code is the same.

Edit: Ah, I think I found my mistake. Hopefully it's fixed now.

privefl commented 1 year ago

What was the problem?

If everything is fixed, please close the issue.

Smarzels commented 1 year ago

I made a mistake in defining df_beta.