Closed bschilder closed 5 months ago
I mapped all variants in ClinVar to functional regions using VariantAnnotation
:
https://github.com/neurogenomics/KGExplorer/blob/master/R/get_clinvar.R
https://github.com/neurogenomics/KGExplorer/blob/master/R/map_variants.R
In total, there are currently 4,780,754 rows in the ClinVar data, representing 2,421,970 unique variants across many different diseases/phenotypes.
I then plotted all ClinVar variants annotations by frequency, faceted by pathogenicity status. Interestingly, the vast majority are splicing mutations!
This data also includes disease/phenotype-variant links. While the specific variant for a specific patient is going to vary from patient to patient, this will at least give us a sense of what kinds of variants mechanisms are at play for a given gene within a given disease. This will be useful when determining what kind of gene therapy would likely be most suitable for this given candidate target.
See here for a nice example of how ASOs can be used to treat rare conditions: https://www.nature.com/articles/s41586-023-06277-0
Actually, it seems that subsetting ClinVar by data source has a big impact.
If we're targeting phenotypes, splice sites seems to be the most relevant overall. This is kind of confusing because (as you'll see) the disease-level filtering identifies coding mutations as the dominant mechanism.
However when you subset to only ClinVar variants associated with OMIM diseases, coding mutations become the dominant mechanism.
Coding mutations remain dominant when considering both OMIM and Orphanet togther.
Exactly which phenotypes/diseases you consider can change the predominant variant-level mechanism. But for our purposes, what's actually important is not necessarily at the aggregate level, but as the phenotype target-specific level.
In other words, once we identify our gene therapy target, we can subset this data to a particular candidate gene for a specific phenotype and identify the most common types of variant-level mechanisms within that gene.
We can use the Monarch knowledge graph to extract info on how each gene is linked to each rare disease phenotype; i.e. missense mutation, splicing mutations, disrupted promoter, methylation, etc.