bschilder commented 5 months ago

We can use the Monarch knowledge graph to extract info on how each gene is linked to each rare disease phenotype; i.e. missense mutation, splicing mutations, disrupted promoter, methylation, etc.

bschilder commented 5 months ago

I mapped all variants in ClinVar to functional regions using VariantAnnotation: https://github.com/neurogenomics/KGExplorer/blob/master/R/get_clinvar.R https://github.com/neurogenomics/KGExplorer/blob/master/R/map_variants.R

In total, there are currently 4,780,754 rows in the ClinVar data, representing 2,421,970 unique variants across many different diseases/phenotypes.

I then plotted all ClinVar variants annotations by frequency, faceted by pathogenicity status. Interestingly, the vast majority are splicing mutations!

clinvar_plot

This data also includes disease/phenotype-variant links. While the specific variant for a specific patient is going to vary from patient to patient, this will at least give us a sense of what kinds of variants mechanisms are at play for a given gene within a given disease. This will be useful when determining what kind of gene therapy would likely be most suitable for this given candidate target.

bschilder commented 5 months ago

See here for a nice example of how ASOs can be used to treat rare conditions: https://www.nature.com/articles/s41586-023-06277-0

bschilder commented 5 months ago

Actually, it seems that subsetting ClinVar by data source has a big impact.

HPO-only

If we're targeting phenotypes, splice sites seems to be the most relevant overall. This is kind of confusing because (as you'll see) the disease-level filtering identifies coding mutations as the dominant mechanism.

variants_hpo_subset

OMIM-only

However when you subset to only ClinVar variants associated with OMIM diseases, coding mutations become the dominant mechanism. variants_omim_subset

OMIM + Orphanet

Coding mutations remain dominant when considering both OMIM and Orphanet togther. variants_omim-orphanet_subset

Conclusion

Exactly which phenotypes/diseases you consider can change the predominant variant-level mechanism. But for our purposes, what's actually important is not necessarily at the aggregate level, but as the phenotype target-specific level.

In other words, once we identify our gene therapy target, we can subset this data to a particular candidate gene for a specific phenotype and identify the most common types of variant-level mechanisms within that gene.

neurogenomics / rare_disease_celltyping

Identify variant-level mechanisms of each rare disease #52

HPO-only

OMIM-only

OMIM + Orphanet

Conclusion