Add Polygenic Risk Score Functionality #367

opened 5 years ago

commented 5 years ago


We're going to work with Sekar Katherisan who has pioneered one of the most important new techniques in genetic risk analysis. It involves applying a simple function to millions of variant calls to determine a simple risk score for a disease area.

For example, his lab reported using the method to evaluate cardiac risk based on 6.6M variants from imputed data sets:

The same technique has been successfully applied to 4 other major diseases: atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer:


Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations


Supplementary material:

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Describes the LDPred algorithm


Supplemental - PDF

Materials and methods show where data is:

Projecting the performance of risk score from GWAS studies

Model building algorithm described


Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

The original polygenic risk score paper


A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease

The Coronary disease GWAS study that provides the GWAS summary stats


Extra data:

A worldwide survey of haplotype variation and linkage disequilibrium in the human genome

Jonathan Pritchard paper widely cited paper on linkage disequilibrium across populations



Polygenic Risk Scores, a Biased Prediction

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations

PDF - paper plus supplemental materials


Our new focus on India means that we will develop a product that is a bit simpler for MVP. Instead of offering the two sided capabilities, we'll offer a core set of reports. The initial thinking is a set of polygenic risk scores + Ancestry that come with the initial purchase and which do not have an author. This gets us off the hook of building out two sided market functionality - including all the complexities of data transfer, authoring tools, payment management, communication tools, validation, ratings, etc - in favor of a much simpler product.

Implementation Thoughts

This technique is not a great fit for our current architecture, however, given that we don't have to offer this as a report-author capability, we can greatly simplify our work by running these risk score calculations at impute-time. We will include the polygenic risk tables in the bioinformatics repo, run the calculations in the python container and store the computed scores for each user in a new table. We can build a simple bespoke report for these scores using plain old React and GraphQL API.