We're going to work with Sekar Katherisan who has pioneered one of the most important new techniques in genetic risk analysis. It involves applying a simple function to millions of variant calls to determine a simple risk score for a disease area.
The same technique has been successfully applied to 4 other major diseases: atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer:
Literature
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
Our new focus on India means that we will develop a product that is a bit simpler for MVP. Instead of offering the two sided capabilities, we'll offer a core set of reports. The initial thinking is a set of polygenic risk scores + Ancestry that come with the initial purchase and which do not have an author. This gets us off the hook of building out two sided market functionality - including all the complexities of data transfer, authoring tools, payment management, communication tools, validation, ratings, etc - in favor of a much simpler product.
Implementation Thoughts
This technique is not a great fit for our current architecture, however, given that we don't have to offer this as a report-author capability, we can greatly simplify our work by running these risk score calculations at impute-time. We will include the polygenic risk tables in the bioinformatics repo, run the calculations in the python container and store the computed scores for each user in a new table. We can build a simple bespoke report for these scores using plain old React and GraphQL API.
Overview
We're going to work with Sekar Katherisan who has pioneered one of the most important new techniques in genetic risk analysis. It involves applying a simple function to millions of variant calls to determine a simple risk score for a disease area.
For example, his lab reported using the method to evaluate cardiac risk based on 6.6M variants from imputed data sets: http://www.kathiresanlab.org/our-publications/genome-wide-polygenic-scores-for-common-diseases-identify-individuals-with-risk-equivalent-to-monogenic-mutations/
The same technique has been successfully applied to 4 other major diseases: atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer:
Literature
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
https://www.nature.com/articles/s41588-018-0183-z
PDF
Supplementary material: https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-018-0183-z/MediaObjects/41588_2018_183_MOESM1_ESM.pdf
Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores
Describes the LDPred algorithm
https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1
PDF
Supplemental - PDF
Materials and methods show where data is: https://www.cell.com/ajhg/fulltext/S0002-9297(15)00365-1#secsectitle0160
Projecting the performance of risk score from GWAS studies
Model building algorithm described https://www.nature.com/articles/ng.2579
PDF
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
The original polygenic risk score paper https://www.researchgate.net/publication/232772602_Common_polygenic_variation_contributes_to_risk_of_schizophrenia_and_bipolar_disorder
PDF
A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease
The Coronary disease GWAS study that provides the GWAS summary stats https://www.researchgate.net/publication/281643470_A_comprehensive_1000_Genomes-based_genome-wide_association_meta-analysis_of_coronary_artery_disease
PDF
Extra data: http://www.cardiogramplusc4d.org/data-downloads/
A worldwide survey of haplotype variation and linkage disequilibrium in the human genome
Jonathan Pritchard paper widely cited paper on linkage disequilibrium across populations
https://web.stanford.edu/group/pritchardlab/publications/ConradEtAl06a.pdf
PDF
Criticism
Polygenic Risk Scores, a Biased Prediction
https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0610-x
Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations
https://www.cell.com/ajhg/pdfExtended/S0002-9297(17)30107-6
PDF - paper plus supplemental materials
India
Our new focus on India means that we will develop a product that is a bit simpler for MVP. Instead of offering the two sided capabilities, we'll offer a core set of reports. The initial thinking is a set of polygenic risk scores + Ancestry that come with the initial purchase and which do not have an author. This gets us off the hook of building out two sided market functionality - including all the complexities of data transfer, authoring tools, payment management, communication tools, validation, ratings, etc - in favor of a much simpler product.
Implementation Thoughts
This technique is not a great fit for our current architecture, however, given that we don't have to offer this as a report-author capability, we can greatly simplify our work by running these risk score calculations at impute-time. We will include the polygenic risk tables in the bioinformatics repo, run the calculations in the python container and store the computed scores for each user in a new table. We can build a simple bespoke report for these scores using plain old React and GraphQL API.