Pleiotropy-informed conditional and conjunctional false discovery rate allows to boost loci discovery in low-powered GWAS by levereging pleiotropic enrichment with a larger GWAS on related phenotype, and to identify genetic loci joinly associated with two phenotypes.
If you use pleioFDR software for your research publication, please cite the following paper(s):
The pleioFDR software may not be used in medical applications.
To install and run pleioFDR on a small example, constrained to chromosome 21:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/pleioFDR_demo_data.tar.gz
tar -xzvf pleioFDR_demo_data.tar.gz
matlab -nodisplay -nosplash < runme.m
To install and run pleioFDR using full example:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_1kgPhase3eur_LDr2p1.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/CTG_COG_2018.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/SSGAC_EDU_2016.mat
cp config_default.txt config.txt
matlab -nodisplay -nosplash < runme.m
For the description of the data, see here.
For the results, inspect the results
folder.
Prerequisites:
The following step by step instruction assumes you are using Linux, however the same can be done in Windows or Mac with minimal modifications.
Download pleioFDR software by going to https://github.com/precimed/pleiofdr in your favorite internet browser, use "Clone or download" button , and "Download zip" do get the latest code.
Alternatively, you may get the code by cloning git repository from command line:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
Download reference data from here. The reference is based on 1000 Genomes phase 3 data (May 2, 2013 release). Variant calls (vcf files) for 22 autosomes were downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502 . We kept only samples of European ancestry (IBS, TSI, GBR, CEU, FIN populations) with missing call rate below 10% and only biallelic variants with non-duplicated ids, minor allele frequency above 1%, missing call rate below 10% and Hardy-Weinberg equilibrium exact test p-values greater than 1.E-20. The filtering was performed with PLINK 1.9. Resulted template contained 503 samples and 9,545,380 variants. Further details are available in about.txt.
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/about.txt
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_1kgPhase3eur_LDr2p1.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/CTG_COG_2018.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/SSGAC_EDU_2016.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_bfile.tar.gz
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/9545380.ref
Those at NORMENT with access to NIRD can also download these data from SUMSTAT/misc/9545380_ref
and SUMSTAT/TMP/mat_9545380
.
Here we explain how to convert raw summary statistics to pleioFDR format.
Feel free to skip this step if you would like to try pleioFDR on CTG_COG_2018.mat
and SSGAC_EDU_2016.mat
,
or if you downloaded input data from the internal NORMENT SUMSTATS
inventory.
Prerequisites:
Downloads:
git clone https://github.com/precimed/python_convert
.wget http://ssgac.org/documents/EduYears_Main.txt.gz -P traitfolder
wget http://ssgac.org/documents/SWB_Full.txt.gz -P traitfolder
Conversion steps:
python src/converter/sumstats.py csv --auto --sumstats traitfolder/EduYears_Main.txt.gz --n-val 328917 --out traitfolder/ssgac.edu.csv --force
python src/converter/sumstats.py csv --auto --sumstats traitfolder/SWB_Full.txt.gz --n-val 298420 --out traitfolder/ssgac.swb.csv --force
python src/converter/sumstats.py mat --sumstats traitfolder/ssgac.edu.csv --ref 9545380.ref --out traitfolder/ssgac.edu.mat
python src/converter/sumstats.py mat --sumstats traitfolder/ssgac.swb.csv --ref 9545380.ref --out traitfolder/ssgac.swb.mat
In the first and second commands --n-val argument indicates sample size. The number is taken from original papers [Okbay et al. (2016)].
python src/converter/sumstats.py --help
python src/converter/sumstats.py csv --help
python src/converter/sumstats.py mat --help
Create a configuration file by copying config_default.txt
file, located in the root of pleioFDR repository.
cp config_default.txt config.txt
Edit config.txt
so that
reffile
points to the ref9545380_1kgPhase3eur_LDr2p1.mat
filetraitfolder
points to folder containing CTG_COG_2018.mat
and SSGAC_EDU_2016.mat
set randprune_n=500
instead of the default randprune_n=20
You may also want to change traitfile1
and traitfiles
options.
Start matlab.
Change current folder to the root of pleiofdr
repository (i.e. a folder containing pleiotropy_analysis.m
).
Execute runme
command, which should trigger pleiofdr analysis.
To run pleioFDR from console:
matlab -nodisplay -nosplash < runme.m
Results are placed in an output folder, defined in config.txt
file. By default it is named results
.
Results contain:
results.mat
file containing condFDR or conjFDR values for all SNPsLoci tables generated in the step above use custom non-standard logic to clump results based on LD structure.
You may want to re-generate loci using sumstats.py clump
script, which implements the same logic as in FUMA.
To do so, convert results.mat
into a text file (for example using scipy.io.loadmat
and pandas.DataFrame.to_csv
), and then perform sumstats.py clump
.
At this step you may use ref9545380_bfile.tar.gz
as a reference to preform clumping.
NB! Octave support is experimental and not officially supported.
install additional packages:
octave --no-gui <(echo "pkg install -forge io statistics")
octave --no-gui <(echo "pkg install -forge nan")
install gammainc function:
wget http://savannah.gnu.org/bugs/download.php?file_id=37342 -O gammainc.m
wget http://savannah.gnu.org/bugs/download.php?file_id=37341 -O __gammainc_lentz.cc
mkoctfile __gammainc_lentz.cc
run:
octave --no-gui runme.m
save('ref9545380_1kgPhase3eur_LDr2p1.mat', '-v7')