xtonyjiang / GNOVA

A principled framework to estimate annotation-stratified genetic covariance using GWAS summary statistics.
http://www.cell.com/ajhg/abstract/S0002-9297(17)30453-6
GNU General Public License v3.0
26 stars 13 forks source link

GNOVA

GNOVA (GeNetic cOVariance Analyzer), a principled framework to estimate annotation-stratified genetic covariance using GWAS summary statistics.

Requirements

  1. Python 2.7
  2. numpy
  3. scipy
  4. pandas
  5. sklearn
  6. bitarray

Tutorial

Suppose you would like to calculate genetic covariance between Crohn's Disease and Ulcerative Colitis. We'll need a few types of files:

More details about these supplied files can be found in in the GNOVA manuscript.

We may run the following command:

python gnova.py data/CD.sumstats.gz data/UC.sumstats.gz \
--N1 27726 \
--N2 28738 \
--bfile data/bfiles/eur_chr@_SNPmaf5 \
--annot data/annot/func.@.txt \
--out results.txt

Explanation of Command-Line Arguments

Additional Command-Line Arguments

Here is an explanation of the other command-line arguments that weren't shown in the example:

Explanation of Output

The output will be a whitespace-delimited text file, with the rows corresponding to different annotations and the columns as such:

NOTE: When functional annotations are present, the true heritability in each annotation category may be small. Although methods for estimating annotation-stratified heritability exist, they may provide unstable, in many cases negative heritability estimates, especially when a number of annotation categories are related to the repressed or non-functional genome. GNOVA ignores negative hertiability estimates, leaving the correlation estimates as 'NaN'. So, we recommend the users to focus on genetic covariance instead of genetic correlation when performing annotation-stratified analysis.

Credits

Those using the GNOVA software should cite:

Lu, et al. A powerful approach to estimating annotation-stratified genetic covariance using GWAS summary statistics. The American Journal of Human Genetics, Volume 101, Issue 6, 939 - 964, 2017.

The LD score calculation is adapted from ldsc. See Bulik-Sullivan, et al. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies. Nature Genetics, 2015.