gwas-pw is a tool for jointly analysing two genome-wide association studies (GWAS). The basic setup is that you have performed two GWAS and want to identify loci that influence both traits. Instead of using two P-value thresholds to identify variants that influence both traits, the algorithm learns reasonable thresholds from the data.
gwas-pw depends on:
The most up-to-date release is: version 0.21. See "Releases" above. After downloading gwas-pw-0.21.tar.gz at the link above, run:
tar -xvf gwas-pw-0.21.tar.gz
cd gwas-pw-0.21
./configure
make
This will create an executable file called gwas-pw in the src directory. The most common compilation error is that the configure script cannot find Boost or GSL. You may have to tell the script explicitly where to find them. For example, on OS X using macports, installations go to the non-standard path /opt/local/lib. To configure in this case, replace the above configure step with:
./configure LDFLAGS=-L/opt/local/lib
Example data is available in the example_data/ directory. To ensure that gwas-pw is working, run:
gwas-pw -i example_data/aam_height_example.gz -bed example_data/all_fourier_ls.bed -phenos AAM HEIGHT
The input file must have the following columns (in any order, they will be identified by the header). Rows must be sorted by chromosomal position:
Note the [pheno1] and [pheno2] will be supplied by you at the command line.
There are three output files:
-[output].segbfs.gz contains a line for each segment of the genome. The columns are:
-[output].bfs.gz contains a line for each SNP in the genome. The columns are:
-[output].MLE contains the estimated regional prior probabilites of each model (same as in [output].segbfs.gz)
-i [file name] name of the input file, in the format described above
-phenos [string] [string] names of the phenotypes, such the the Z scores are in columns labeled Z[pheno1] and Z[pheno2]
-o [string] stem for names of output files
-bed [file name] gwas-pw splits the genome into approximately independent blocks. To input these blocks from a .bed file, use this option. We recommend using the bed files available from https://bitbucket.org/nygcresearch/ldetect-data
-noprint don't print the Bayes factors
-k [integer] as an alternative to spliting the genome into blocks based on the bed file, input the number of SNPs per block. If neither -k or -bed is specified, this defaults to blocks of 5,000 SNPs
-cor [float] if the two GWAS were performed using overlapping cohorts, use this flag to specify the expected correlation in summary statistics under the null (defaults to zero)