privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
192 stars 44 forks source link

result difference between snp_ldsc and LDSC #295

Closed dongzhblake closed 2 years ago

dongzhblake commented 2 years ago

Hi there!

I am running LDSR analysis using the package you developed, as well as LDSC. However, I noticed that the results (i.e., intercept and h2 estimate) from bigsnpr and LDSC are not similar even with the same set of summary statistics and LD scores. This happens when I run for real data and simulated data (UKBB data).

To be more specific, my observation is that, when I set LDSC parameter --two-step to almost infinity, LDSC produces close h2 estimate with bigsnpr, but much lower intercept estimate than bigsnpr. When I set --two-step to be default (30), LDSC produces close h2 estimate with bigsnpr, but much lower intercept estimate than bigsnpr. SE estimates are close for the two methods when they approach the true value.

This really confuses me because I took a look at the code of bigsnpr but everything looks fine to me. Can you possibly provide any suggestions? It will be very much appreciated!

privefl commented 2 years ago

Please send me some sumstats and LD scores you provide to ldsc (in ldsc format, e.g. one zip with everything you use) so that I try to reproduce this.

dongzhblake commented 2 years ago

I think the reason is that, LDSC by default only uses the number of SNPs with MAF>5% as M. Therefore its heritability estimate is smaller than if using all SNPs.

privefl commented 2 years ago

Yes, probably. But I thought the problem was with the intercept?