Closed xueweic closed 5 years ago
There are 9,997,231 SNPs in the reference panel (1000G EUR Phase3); however, enrichment estimates are common (MAF > 5%) SNPs enrichment (see Finucane et al. NG 2015; which is 5,961,159), because other SNPs are low frequency SNPs. I note for LD, we do consider all SNPs including low frequency.
Baseline-LD is publicly available on the URL you have, and you can download and run annotation conditioning on latest version of the baseline-LD (v.2.2). When I completed previous work, baseilne-LD 2.2. was not available.
If you want to get 5,961,159 SNPs from 9,997,231 SNPs, eliminate based on MAF (<5%). Then, you will get the list. In terms of annotation, you don't need to manually remove, as this is being taken care by S-LDSC (unless you change other parameters; see S-LDSC wiki page). Lastly, I note there is baseline-LF (Gazal NG 2018) if you want to investigate low-freq enrichments separately, which was the outside the scope of Kim et al. 2019 AJHG. Let me know if you have more questions.
Thank you very much.
Hi Samuel S. Kim,
I have a question about the data of baseline LD annotations. I followed the instruction and downloaded the data from web source: https://data.broadinstitute.org/alkesgroup/LDSCORE/. I saw the paper considered the total number of SNPs is M=5961,159. When I loaded the data in R, I saw that there are M=9997,231 SNPs in the dataset baselineLD v1.1. So I want to know which SNPs you have been removed in the dataset? Thank you for your help.
Best, Wei