precisely / web

1 stars 0 forks source link

Add Polygenic Risk Score Functionality #367

Open aneilbaboo opened 5 years ago

aneilbaboo commented 5 years ago

Overview

We're going to work with Sekar Katherisan who has pioneered one of the most important new techniques in genetic risk analysis. It involves applying a simple function to millions of variant calls to determine a simple risk score for a disease area.

For example, his lab reported using the method to evaluate cardiac risk based on 6.6M variants from imputed data sets: http://www.kathiresanlab.org/our-publications/genome-wide-polygenic-scores-for-common-diseases-identify-individuals-with-risk-equivalent-to-monogenic-mutations/

The same technique has been successfully applied to 4 other major diseases: atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer:

Literature

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

https://www.nature.com/articles/s41588-018-0183-z

PDF

Supplementary material: https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0183-z/MediaObjects/41588_2018_183_MOESM1_ESM.pdf

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Describes the LDPred algorithm

https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1

PDF

Supplemental - PDF

Materials and methods show where data is: https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1#secsectitle0160

Projecting the performance of risk score from GWAS studies

Model building algorithm described https://www.nature.com/articles/ng.2579

PDF

Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

The original polygenic risk score paper https://www.researchgate.net/publication/232772602_Common_polygenic_variation_contributes_to_risk_of_schizophrenia_and_bipolar_disorder

PDF

A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease

The Coronary disease GWAS study that provides the GWAS summary stats https://www.researchgate.net/publication/281643470_A_comprehensive_1000_Genomes-based_genome-wide_association_meta-analysis_of_coronary_artery_disease

PDF

Extra data: http://www.cardiogramplusc4d.org/data-downloads/

A worldwide survey of haplotype variation and linkage disequilibrium in the human genome

Jonathan Pritchard paper widely cited paper on linkage disequilibrium across populations

https://web.stanford.edu/group/pritchardlab/publications/ConradEtAl06a.pdf

PDF

Criticism

Polygenic Risk Scores, a Biased Prediction

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0610-x

Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations

https://www.cell.com/ajhg/pdfExtended/S0002-9297(17)30107-6

PDF - paper plus supplemental materials

India

Our new focus on India means that we will develop a product that is a bit simpler for MVP. Instead of offering the two sided capabilities, we'll offer a core set of reports. The initial thinking is a set of polygenic risk scores + Ancestry that come with the initial purchase and which do not have an author. This gets us off the hook of building out two sided market functionality - including all the complexities of data transfer, authoring tools, payment management, communication tools, validation, ratings, etc - in favor of a much simpler product.

Implementation Thoughts

This technique is not a great fit for our current architecture, however, given that we don't have to offer this as a report-author capability, we can greatly simplify our work by running these risk score calculations at impute-time. We will include the polygenic risk tables in the bioinformatics repo, run the calculations in the python container and store the computed scores for each user in a new table. We can build a simple bespoke report for these scores using plain old React and GraphQL API.