immunogenomics / IMPACT

Code for creating cell-type-specific regulatory element annotation files
GNU General Public License v3.0
18 stars 8 forks source link

Tau* calculation - M SNPs #9

Closed bcrone closed 2 years ago

bcrone commented 2 years ago

In calculation of tau*, why is the M SNP count hard-coded as 5,961,159 SNPs? I'm lost where this value is coming from.

TiffanyAmariuta commented 2 years ago

This value of M corresponds to the number of common European variants (MAF

5%) in the genome. Since LDSC score regression is performed using only common variants, we are only concerned about the values of continuous annotations across these common variants. Hope this helps!

On Tue, Apr 12, 2022 at 1:54 PM Brad Crone @.***> wrote:

In calculation of tau*, why is the M SNP count hard-coded as 5,961,159 SNPs? I'm lost where this value is coming from.

— Reply to this email directly, view it on GitHub https://github.com/immunogenomics/IMPACT/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBTQRHU6DL6WWY3YWL4KOLVEW2FDANCNFSM5TICPVZQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Tiffany Amariuta-Bartell, PhD Postdoctoral Researcher Department of Epidemiology Harvard T.H. Chan School of Public Health

bcrone commented 2 years ago

Thanks for the quick reply. Still a bit confused why this value is constant across all GWAS traits. In the LDSC script 1_SLDSC_MakeLDScores_orPatitionH2_EUR.sh, LDscores are restricted to the set defined in SNP_rsID_list_EUR.txt, which is a list of 1,217,311 SNPs. Wouldn't any downstream SNP heritability estimation be restricted to this set, and also serve as an upper-bound for M?

TiffanyAmariuta commented 2 years ago

The value is constant across GWAS traits because allele frequency does not depend on GWAS trait.

The 1.2M SNPs you mention are hapmap SNPs. If you have questions about which SNPs are used at which stage of LDSC, please consult the Finucane 2015 Nat Genet and Gazal 2017 Nat Genet papers.

bcrone commented 2 years ago

Thanks for the clarification!

TiffanyAmariuta commented 1 year ago

The value is constant across GWAS traits because allele frequency does not depend on GWAS trait.

The 1.2M SNPs you mention are hapmap SNPs. If you have questions about which SNPs are used at which stage of LDSC, please consult the Finucane 2015 Nat Genet and Gazal 2017 Nat Genet papers.

Best,

Tiffany

On Tue, Apr 12, 2022 at 3:37 PM Brad Crone @.***> wrote:

Thanks for the quick reply. Still a bit confused why this value is constant across all GWAS traits. In the LDSC script 1_SLDSC_MakeLDScores_orPatitionH2_EUR.sh, LDscores are restricted to the set defined in SNP_rsID_list_EUR.txt, which is a list of 1,217,311 SNPs. Wouldn't any downstream SNP heritability estimation be restricted to this set, and also serve as an upper-bound for M?

— Reply to this email directly, view it on GitHub https://github.com/immunogenomics/IMPACT/issues/9#issuecomment-1097137794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBTQREDTFXKIZTAYEMW263VEXGGFANCNFSM5TICPVZQ . You are receiving this because you commented.Message ID: @.***>

-- Tiffany Amariuta-Bartell, PhD Postdoctoral Researcher Department of Epidemiology Harvard T.H. Chan School of Public Health