This command-line software enables summary statistics imputation (SSimp) for GWAS summary statistics.
The only input needed from the user are the GWAS summary statistics and a reference panel (e.g. 1000 genomes, needed for LD computation).
If this is the first time that you download SSIMP run the following line:
ssimp --download.build.db
This will download a database with all positions on different builds.
Alternatively, run the following, if you are planning to use 1000 genomes as the reference panel.
ssimp --download.1KG
This will also download the build database along with 1000 genomes reference panel.
If you had previous versions of SSIMP installed, you NEED to delete and redownload the database:
rm ~/reference_panels/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin
ssimp --download.build.db
The download option is equivalant to:
# wget https://zenodo.org/records/13222092/files/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin?download=1 -O ~/reference_panels/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin
You may get assertion errors when running this, even after following the Installation Instructions. Please also check the Bug Reports section below. Here are a couple of things you can check:
1) the size of the build database should be 1'789'839'360
bytes: ~/reference_panels/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin
. If not, see Bug Reports below on how to download an updated version
2) The simplest way to call this is with: ssimp gwas/small.random.txt 1KG/EUR output.txt
. Otherwise, you can select the any (super) population, e.g. AFR, with: ssimp gwas/small.random.txt ~/reference_panels/1000genomes/ALL.chr{CHRM}.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz output.txt --sample.names ~/reference_panels/1000genomes/integrated_call_samples_v3.20130502.ALL.panel/sample/super_pop=AFR
. If you wish to impute only a particular chromosome, you should use --impute.range 17
.
We recommend to download the full folder. This will give you access to all toy examples.
(1) Download
wget https://github.com/zkutalik/ssimp_software/archive/master.zip
unzip master.zip
(2) Access the folder
cd ssimp_software-master
(3) Rename the binary
Depending on what OS you are on :
cp compiled/ssimp-linux-0.5.6 ssimp
cp compiled/ssimp-osx-0.5.6 ssimp
Linux - static version
MacOS - dynamic version. You need to install GSL 1.16 (GNU Scientific Library) from here: http://ftp.gnu.org/gnu/gsl/.
Check your gcc version with gcc --version
. It needs to be 5.0.0 or higher.
Mac user? You need to install GSL: brew install gsl
.
(1) Download the zip file and unpack the zip file (or clone the github folder)
wget https://github.com/zkutalik/ssimp_software/archive/master.zip
unzip master.zip
(2) Access the folder
cd ssimp_software-master
(3) Run the makefile (source compilation)
make
This will create a file bin/ssimp
.
cp bin/ssimp ssimp
(4) Run your first summary statistics imputation on a test file (uses a small toy reference panel)
ssimp --gwas gwas/small.random.txt --ref ref/small.vcf.sample.vcf.gz --out output.txt
(5) To run your own summary statistics imputation, you need to have a reference panel ready (in the folder reference_panels
in your home directory). Check out Download 1000 genomes reference panel below to download.
While the github folder itself is rather small (136 MB), the folder reference_panels
takes up about 17 GB storage on the hard disk when 1KG is installed.
When running, the software uses about 5 GB of memory. Make sure you have enough RAM when running the software in parallel!
It takes roughly ~200 CPU hours to impute to full genome (~20M variants) with 500 individuals from 1000 Genomes reference panel. Runtime depends on the number of variants imputed and the size of the reference panel.
ssimp --gwas gwas/small.random.txt --ref ref/small.vcf.sample.vcf.gz --out output.txt
will impute the Z-statistics, using a selected reference panel (see section above) and generate a file output.txt
. The txt file assigned to --gwas
contains at least the following columns: SNP-id, reference allele, risk allele, Z-statistic, and at least one row. The imputed summary statistics are stored in output.txt
.
See more examples here.
Run ssimp
with no arguments to see the usage message.
Check out examples.
We also provide a detailed manual that has overlapping information with the usage message, but is a bit more detailed in more aspects.
Run
ssimp --download.1KG
Important! Reference panels provided in folder ref
are toy reference panels for testing and examples and not made to use for proper usage.
More info on handling reference panel data can be found in usage message.
stu @all.tests
or check this file.
git pull master && git tag v0.5.6 master && git push --tags
Aaron McDaid (implementation, method development)
Sina Rüeger (coordination, method development)
Zoltan Kutalik (supervision, method development)
Ninon Mounier (Bugfix #104)
We have used code (with permission, and under the GPL) from libStatGen and stu.
If you use this software in your scientific research, please consider citing
Rüeger, S., McDaid, A., Kutalik, Z. (2018). Evaluation and application of summary statistic imputation to discover new height-associated loci. PLOS Genetics. https://doi.org/10.1371/journal.pgen.1007371
Rüeger, S., McDaid, A., Kutalik, Z. (2018). Improved imputation of summary statistics for realistic settings. bioRxiv. https://doi.org/10.1101/203927
Copyright 2018. (Aaron McDaid, Sina Rüeger, Zoltán Kutalik, https://github.com/zkutalik/ssimp_software)
Report bugs here: https://github.com/zkutalik/ssimp_software/issues
If you are having trouble with SSIMP versions >= 0.4, make sure that you delete your database in folder ~/reference_panels/database*
and redownload it using the following command
wget https://drive.switch.ch/index.php/s/uOyjAtdvYjxxwZd/download -O ~/reference_panels/database.of.builds.1kg.uk10k.hrc.2018.01.18.bin