The issue

The HPO includes lots of gene-disease/phenotype associations. But these do not contain any information about the current strength of the evidence supporting these associations.

A solution

Peter Robinson alerted me to GenCC which does exactly this. These were produced through manual annotation by experts across academia, medicine, and industry via systematic Delphi studies (which are the gold-standard in these kinds of meta-analyses): https://search.thegencc.org/download

I've gone ahead and integrated this information into the HPO annotations via a new function: HPOExplorer::add_evidence

phenos <- HPOExplorer::load_phenotype_to_genes()
phenos2 <- HPOExplorer::add_evidence(phenos = phenos)

Assessments

Classification types

These evidence scores can be added by multiple users/companies, so I codified the evidence classifications into numeric scores and aggregated them per gene-disease association.

We can see the distribution of evidence scores across all HPO annotations: 9038cac2-d57c-4df8-8637-dacf2ea13948

Most annotations are "Supportive". I can't find an explanation of this term on their site, but it is classified as being just below "Moderate". https://search.thegencc.org/

HPO coverage

The GenCC annotation cover a surprisingly large proportion of the HPO annotations. This means they can be quite useful to us for filtering the potential gene therapy targets:

73.8% of all gene associations in the HPO annotations
76.4% of all 7839 diseases in the HPO annotations
94.7% of all 9677 phenotypes in the HPO (with at least 1 gene annotation) via one disease or another

nrow(phenos2[!is.na(evidence_score_mean),]) / nrow(phenos2)
# [1] 0.737768
length(unique(phenos2[!is.na(evidence_score_mean),]$DatabaseID)) / length(unique(phenos2$DatabaseID))
# [1] 0.7642556
length(unique(phenos2[!is.na(evidence_score_mean),]$HPO_ID)) / length(unique(phenos2$HPO_ID))
# [1] 0.9466777

Limitations

Note, since these evidence scores are only provided at the disease-level, we can't assign these scores to specific phenotypes associated with those diseases. This means having to assign the same evidence score to each gene-phenotype association within a given gene-disease association.

~~But since most phenotypes are associated with a disease via a couple of genes, the resulting number of combinations isn't too large.~~

> nrow(phenos)
[1] 968116
> nrow(phenos2)
[1] 968116

This combinatorics issue was actually due to some diseases/gene symbols having multiple unique entries in the following columns: "disease_curie", "disease_original_curie", "disease_title", "gene_curie"

There were also a couple of duplicate entries per "DiseaseID" in the annot <- load_phenotype_to_genes(3) and add_disease().

Removing these columns from the agg_by arg resolving the issue. Still though, these evidence scores are limited in the sense that they are not phenotype-specific (only disease-specific).

Next steps

[x] Add these evidence scores as a new argument in MultiEWE::prioritise_targets.
[x] Test out how many associations remain after filtering at each evidence level. The numbers will chance once we change the prior filtering steps based on phenotype metadata (currently being done by @KittyMurphy via chatGPT), but i can at least begin to get a sense of how many associations are left after filtering at each evidence level.
[x] Test for relationships between specificity and evidence strength per gene per cell-type.

Suggestions by @NathanSkene

If we want to make the jump to suggesting the confidence scores extend to phenotypes, then it would be good to show that higher confidence genes have higher confidence associations with their celltypes than the lower confidence genes. we'll have less high confidence genes than low confidence ones. so we'd need to do the comparison using equal numbers of low confidence ones We could run EWCE on subsampled low confidence genes (for one trait), and test whether the average cell type enrichment is lower for lower confidence genes? Or we could try some kind of genome/phenome wide test for whether confidence ~ specificity

[x] Add these evidence scores as a new argument in MultiEWE::prioritise_targets. Done.

[x] Test out how many associations remain after filtering at each evidence level. The numbers will chance once we change the prior filtering steps based on phenotype metadata (currently being done by @KittyMurphy via chatGPT), but i can at least begin to get a sense of how many associations are left after filtering at each evidence level.

phenos <- load_phenotype_to_genes()
  phenos2 <- add_evidence(phenos = phenos,
                          default_score = NULL)
  testthat::expect_true(
    all(c("evidence_score_min",
          "evidence_score_max",
          "evidence_score_mean") %in% names(phenos2))
  )

  rep <- lapply(c(NA,seq(0,6)), function(x){
    vars <- c("evidence_score_min","evidence_score_max","evidence_score_mean")
    lapply(stats::setNames(vars,vars),
           function(v){
             p <- if(is.na(x)){
               phenos2
             } else {
               phenos2[get(v)>=x,]
             }
             data.table::data.table(
               evidence_score_filter=toString(x),
               rows=nrow(p),
               diseases=length(unique(p$DatabaseID)),
               phenotypes=length(unique(p$HPO_ID)),
               genes=length(unique(p$Gene))
             )
           }) |> data.table::rbindlist(use.names = TRUE,
                                       idcol = "evidence_score_variable")
  }) |> data.table::rbindlist()

  rep_melt <- data.table::melt.data.table(
    rep,
    measure.vars = names(rep)[-seq_len(2)],
    variable.name = "metric",
    value.name = "count")
  rep_melt$evidence_score_filter <- factor(rep_melt$evidence_score_filter,
                                           levels = unique(rep_melt$evidence_score_filter),
                                           ordered = TRUE)
  ggplot2::ggplot(rep_melt,
                  ggplot2::aes(x=evidence_score_filter,
                               y=count,
                               shape=evidence_score_variable,
                               color=evidence_score_variable)) +
    ggplot2::geom_smooth(se = FALSE, na.rm = FALSE) +
    ggplot2::geom_point(size=3, alpha=.8, na.rm = FALSE) +
    ggplot2::facet_wrap(facets = metric~.,
                        scales = "free") +
    ggplot2::theme_bw()

Filtering HPO annotations at each evidence level (or greater) shows that there is a nice smooth decrease in the number of associations/diseases/phenotypes/genes (as opposed to the vast majority of the data dropping off precipitously at some threshold, which would indicate that the evidence annotations are skewed). evidence_filters

At the most stringent evidence level (6) we still have quite a few phenotypes/diseases/genes, which I think bodes well for prioritising gene therapy targets (we'll have to wait until we have the chatGPT annotations to see how this bears out in the prioritisation pipeline).

[x] Test for relationships between specificity and evidence strength per gene per cell-type.

We don't have the new set of enrichment results yet (still waiting for the HPO annotations to be fixed https://github.com/neurogenomics/HPOExplorer/issues/37).

But in the meantime, I can at least begin to look into this using our old results.

We don't have the new set of enrichment results yet (still waiting for the HPO annotations to be fixed https://github.com/neurogenomics/HPOExplorer/issues/37). But in the meantime, I can at least begin to look into this using our old results.

Investigated the relationships between evidence score and specificity (within each celltype) using a genome/phenome-wide strategy using all of our results.

Results

See here for full Rmarkdown report code. I chose not to render it as an html because the data is rather large, instead using it more as a way of documenting the code for these analyses: https://github.com/neurogenomics/RareDiseasePrioritisation/blob/master/reports/evidence_scores.Rmd

Phenotype-level

Our phenotype EWCE results are obviously at the level of phenotype, while the evidence scores are purely at the disease level. This means you have multiple evidence scores within a given celltype-phenotype association.

res <- MultiEWCE::load_example_results("Descartes_All_Results_extras.rds")
res <- HPOExplorer::add_genes(res, allow.cartesion = TRUE)
res <- HPOExplorer::add_evidence(phenos = res)
ctd_out <- MultiEWCE::add_ctd(results = res)
res <- ctd_out$results

df <- res[,list(specificity=mean(specificity),
          mean_exp=mean(mean_exp),
          evidence_score_min=mean(evidence_score_min),
          evidence_score_mean=mean(evidence_score_mean),
          evidence_score_max=mean(evidence_score_max)),
    by=c("HPO_ID","CellType")]

I computed mean evidence score within each celltype-phenotype combination and checked for correlation, but it seems there is no relationship:

> cor(df$evidence_score_mean,
+     df$mean_exp)
[1] -0.02194124
> cor(df$evidence_score_mean,
+     df$specificity) 
[1] -0.01300368

This is confirmed when visualizing as a scatter plot:

library(ggplot2) 
ggplot(df, aes(x=evidence_score_mean,y =specificity)) +
  geom_point(alpha=.1) +
  # geom_density2d_filled(adjust=.5) +
  geom_smooth()

00001b

And again after logging both axes (to try and improve visibility):

library(ggplot2) 
ggplot(df, aes(x=log(evidence_score_mean),y =log(specificity))) +
  geom_point(alpha=.1) +
  # geom_density2d_filled(adjust=.5) +
  geom_smooth()

00001c_logged

Symptom-level

We could also try a different strategy, and look at the symptom enrichment results (Note: we only have the Fisher's exact enrichment test results atm, waiting on the updated HPO gene annotations to rerun with EWCE).

res <- MultiEWCE::load_example_results("Descartes_All_Results_extras.symptoms.full_join.rds")
res <- HPOExplorer::phenos_to_granges(phenos = res,
                                      as_datatable = TRUE)
# res <- HPOExplorer::add_genes(res, allow.cartesion = T)
res <- HPOExplorer::add_evidence(phenos = res)
ctd_out <- MultiEWCE::add_ctd(results = res)
res <- ctd_out$results

Then, I aggregate the celltype enrichment results.

df <- res[,list(specificity=mean(specificity),
          mean_exp=mean(mean_exp),
          evidence_score_min=mean(evidence_score_min),
          evidence_score_mean=mean(evidence_score_mean),
          evidence_score_max=mean(evidence_score_max)),
    by=c("HPO_ID","CellType")]

Again, seems to be no relationships between specificity/expression and evidence scores.

> cor(df$evidence_score_mean,
+     df$mean_exp)
[1] -0.006639101
> cor(df$evidence_score_mean,
+     df$specificity) 
[1] -0.01384008

000002

Interpretation

Based on the analyses so far, I can't find any evidence of a relationship between celltype specificity (for a given celltype that is enriched in the set of phenotype that make up a disease) and gene-disease association evidence scores.

While a good first pass, the above strategies may or may not be the best to investigate this question of whether there's a relationship between specificity and evidence scores. In both approaches, there is a mismatch between the level at which the evidence scores are annotated (gene-disease) and our enrichment tests (phenotype-level and symptom-level).

I suppose we could do EWCE tests at the disease level and see whether the genes driving the enrichment (ie have high specificity in that particular celltype) also have high evidence scores. But I'm not sure this would really help us to make any sort of argument about our phenotype-level results.

We could run EWCE on subsampled low confidence genes (for one trait), and test whether the average cell type enrichment is lower for lower confidence genes?

This previous suggestion by @NathanSkene might be another option, but I'll have to think a bit more about I would implement this.

neurogenomics / RareDiseasePrioritisation

Add evidence scores to gene associations #27