lukfor / pgs-calc

Applying polygenic scores (PGS) on imputed genotypes
MIT License
25 stars 4 forks source link

PGS Calculator

Java CI with Mavencodecov GitHub release

Applying polygenic scores (PGS) on imputed genotypes

Features

Installation

Usage

Applying polygenic scores (PGS) on imputed genotypes:

pgs-calc apply --ref PGS000018 --out PGS000018.scores.txt chr*.dose.noID.vcf.gz  --report-html PGS000018.html

The weights for score PGS000018 are downloaded automatically from PGSCatalog and all scores are written to file PGS000018.scores.txt. An interactive report html report is created.

Required parameters

Optional parameters

Input files

Genotypes

Scores

pgs-calc supports PGSCatalog out of the box: open the website, find your score of interest and download the provided txt.gz files.

As pgs-calc works with chromosomal positions and not with marker ids, the following requirements must be fulfilled:

  1. The build of your genotypes and the build of the score must be the same. If the score is on a different build, you can use the pgs-calc resolve command to lift over to the build of the genotypes.
  2. The score file needs chr_name and chr_position columns. If there is only rsID present, you need to set the parameter --dbsnp and the correct index to convert rsIDs on the fly to the correct chromosomal positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp-index from here.
  3. The column other_allele is mandatory to handle multi-allelic variants in an unified way.

If you want to create your own weight files, you need a tab-delimited text file with the following columns:

chr_name  chr_position  effect_allele other_allele  effect_weight

Examples

Single chromosome

Apply PGS to a single file (e.g. one chromosome):

pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz --out scores.txt

All scores are written to file scores.txt

Multiple chromosomes

Apply PGS to multiple files (e.g. multiple chromosomes):

pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz test.chr2.vcf.gz test.chr3.vcf.gz test.chr4.vcf.gz --out scores.txt

Apply PGS to multiple files by using file patterns:

pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --out scores.txt

Multiple scores

Apply multiple score files:

pgs-calc apply --ref PGS000018.txt.gz,PGS000027.txt.gz test.chr*.vcf.gz --out scores.txt

You can also create a file scores_filenames.txt that lists all paths to your score files:

scores
PGS000018.txt.gz
PGS000027.txt.gz
pgs-calc apply --ref scores_filenames.txt test.chr*.vcf.gz --out scores.txt

Attention: All paths inside the file are relative to the location of the file itself.

Filter by Imputation Quality

Use only variants with an imputation quality (R2) >= 0.9:

pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --minR2 0.9 --out scores.txt

PGSCatalog support

If a PGS id is provided, pgs-calc downloads the file from PGSCatalog automatically:

pgs-calc apply --ref PGS000018 test.chr1.vcf.gz --out scores.txt

All scores are written to file scores.txt.

You can also use the download command to download a specific PGS id:

pgs-calc download PGS000018 --out PGS000018.txt.gz

The weights are saved in file PGS002297.txt.gz.

Scores with rsIDs

If the --dbsnp parameter is set, pgs-calc converts on the fly all rsID automatically to their positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp index from here.

pgs-calc apply --ref PGS002297 test.chr1.vcf.gz --out scores.txt --dbsnp dbsnp154_hg19.txt.gz

All scores are written to file scores.txt

Different Builds

The build of your genotypes and the score must be the same. If the score is on a different build, you can use the pgs-calc resolve command to lift over the score file to the build of the genotypes. You need a dbsnp-index file and a chain file.

pgs-calc resolve --in PGS002297 --out PGS002297.hg38.txt.gz --dbsnp dbsnp154_hg38.txt.gz --chain hg19_to_hg38.over.chain.gz

The new positions are written to file PGS002297.hg38.txt.gz and this file can the be used by pgs-calc apply.

Resources

Contact

Lukas Forer, Institute of Genetic Epidemiology, Medical University of Innsbruck