privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Clumping option #469

Closed msong97 closed 6 months ago

msong97 commented 6 months ago

I have read your paper "Making the Most of Clumping and Thresholding for Polygenic Scores". It is a very interesting and useful paper!

I have an inquiry about the clumping option.

In your paper, here is the paragraph.

Base size of clumping window within {50, 100, 200, 500}. The window size wc is then computed as the base size divided by r2c. For example, for r2c=0.2, we test values of wc within {250, 500, 1000, 2500} (in kb). This is motivated by the fact that linkage disequilibrium is inversely proportional to genetic distance between variants.

I wonder whether the actual window size is 250, 500, 100, and 2500. If the window size is 2500 kb, then will it clump any SNPs that is within 1250kb to both end of the index SNP?

I consider clumping any SNPs that is within 1000 kb to both end of the index SNP (i.e., 2000 kb window size), I just wonder it is reasonable number as a tuning value.

Thanks!

privefl commented 6 months ago

This size represents the distance from the index SNP (cf. e.g. https://github.com/privefl/bigsnpr/blob/master/src/clumping-utils.h#L22-L23), so that the total window size is actually twice that.

Any suggestion on how to improve the documentation is welcome.

msong97 commented 6 months ago

Thank you for the information! Thus wc in your paper is the size (i.e., the distance from the index SNP) in your code, isn'it?

privefl commented 6 months ago

Yes

msong97 commented 6 months ago

Thank you for all the information!