samskim / networkconnectivity

Kim et al. Am J Hum Genet. 2019
4 stars 0 forks source link

About the number of SNPs #1

Closed xueweic closed 5 years ago

xueweic commented 5 years ago

Hi Samuel S. Kim,

I have a question about the data of baseline LD annotations. I followed the instruction and downloaded the data from web source: https://data.broadinstitute.org/alkesgroup/LDSCORE/. I saw the paper considered the total number of SNPs is M=5961,159. When I loaded the data in R, I saw that there are M=9997,231 SNPs in the dataset baselineLD v1.1. So I want to know which SNPs you have been removed in the dataset? Thank you for your help.

Best, Wei

samskim commented 5 years ago

There are 9,997,231 SNPs in the reference panel (1000G EUR Phase3); however, enrichment estimates are common (MAF > 5%) SNPs enrichment (see Finucane et al. NG 2015; which is 5,961,159), because other SNPs are low frequency SNPs. I note for LD, we do consider all SNPs including low frequency.

Baseline-LD is publicly available on the URL you have, and you can download and run annotation conditioning on latest version of the baseline-LD (v.2.2). When I completed previous work, baseilne-LD 2.2. was not available.

If you want to get 5,961,159 SNPs from 9,997,231 SNPs, eliminate based on MAF (<5%). Then, you will get the list. In terms of annotation, you don't need to manually remove, as this is being taken care by S-LDSC (unless you change other parameters; see S-LDSC wiki page). Lastly, I note there is baseline-LF (Gazal NG 2018) if you want to investigate low-freq enrichments separately, which was the outside the scope of Kim et al. 2019 AJHG. Let me know if you have more questions.

xueweic commented 5 years ago

Thank you very much.