Assess whether cell types predict clinical course

bschilder commented 1 year ago

Following up from here: https://github.com/neurogenomics/RareDiseasePrioritisation/issues/20#issuecomment-1481631792

What I meant was a bit more detailed than that. Meant what we discussed on the call the other day:

Can we say, that this phenotype, when it acts via this celltype, has a different clinical course?

To answer this, we want to have at least one phenotype, that has 20 diseases associated with it, where they have at least two different clinical course values (each with >10 diseases)

bschilder commented 1 year ago

Strategy summary:

Find phenotypes associated w/ 20+ diseases & have metadata.

Run a series of linear models, one per disease-metadata variable combination.

Then test whether the EWCE enrichment p-values (conditional on celltype) significantly affect whether the phenotype affects clinical outcomes (e.g. age of death).

The formula for the linear model is as follows:

outcome ~ EWC_pvalue * celltype

NathanSkene commented 1 year ago

Can you confirm how many phenotypes have the required data before proceeding with linear modelling? Not clear that we’d be able to use the EWCE pvalue as most the diseases will be single gene

Sent from Outlook for iOShttps://aka.ms/o0ukef

From: Brian M. Schilder @.> Sent: Friday, March 24, 2023 9:47:53 AM To: neurogenomics/RareDiseasePrioritisation @.> Cc: Subscribed @.***> Subject: Re: [neurogenomics/RareDiseasePrioritisation] Assess whether cell types predict clinical course (Issue #23)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Strategy summary:

Find phenotypes associated w/ 20+ diseases & have metadata.

Run a series of linear models, one per disease-metadata variable combination.

Then test whether the EWCE enrichment p-values (conditional on celltype) significantly affect whether the phenotype affects clinical outcomes (e.g. age of death).

The formula for the linear model is as follows:

outcome ~ EWC_pvalue * celltype

— Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/RareDiseasePrioritisation/issues/23#issuecomment-1482524104, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPEYLFI2ISTITNOAQW3LW5VUUTANCNFSM6AAAAAAWGJ34T4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

bschilder commented 1 year ago

already done, just about to summarise

bschilder commented 1 year ago

New report here: https://neurogenomics.github.io/RareDiseasePrioritisation/reports/differential_outcomes

bschilder commented 1 year ago

Can you confirm how many phenotypes have the required data before proceeding with linear modelling? Not clear that we’d be able to use the EWCE pvalue as most the diseases will be single gene

Not really sure what you mean by this. This is not a gene-level test. Are you thinking of the other thread? #18

But here is the answer to the numbers question:

In our EWCE results data, there were 6170 phenotypes
phenotypes were associated with 1- 4403 diseases each (mean=92).
3228 phenotypes remained after filtering to those with >20 diseases
within this filtered group, phenotypes were associated with 1034 diseases on average
linear models were run for each phenotypes, using a separate model for each clinical course variable. only tests with sufficient data for that variable in that phenotype were run.
in the end, 1043 phenotypes could be run with the metadata we currently have from the HPO annotations (despite being as incomplete as they are).
3129 of the models were signficant at a nominal pvalue<0.05. This encompassed 802 unique phenotypes.

The summary is that it seems to have worked, all clinical outcomes were significantly modified by cell type identity in the vast majority of phenotype tests. See the Rmarkdown report above for details

bschilder commented 1 year ago

This summarises the findings visually, and you can see the distribution of clinical courses shifting leftward as you look at test results with an increasingly stringent pvalue threshold (moving from 1. to 6. in each facet).

We're able to visualise the metadata this way because I codified all values for each variable to ordinal scales (eg age of onset, on a score from 1-11). https://neurogenomics.github.io/HPOExplorer/reference/add_onset.html

plt

bschilder commented 1 year ago

Realised it didn't quite make sense to use the phenotype-level p-values as this does not vary across diseases (within a phenotype). I think this might be what @NathanSkene was trying to say in our last meeting, but my brain was a bit fried at the time so it didn't sink in. In those set of analyses, I believe the reason we were seeing significant clinical outcome modifying cell types was an artifact of how i had merge the pre-filtered symptom-level celltype enrichment results.

That said, I've gone back and redone the analyses using the symptom-level celltype enrichment p-values, and we actually see far more significant results in which celltype was a significant modifier of clinical outcomes (across all clinical variables).

Here are the top most significant results for each clinical variable. In these plots, the numerically-encoded clinical variable is plotted on the y-axis, and the celltypes that each disease are significantly enriched for (within a phenotype) are on the x-axis.

top_modifiying_celltypes

These can be interpreted as examples of phenotypes where the clinical outcomes are variable, and are dependent on which disease (and thus celltype-specific mechanism) is actually causing them. The exact disease is not always known, and typically clinicians have to try and figure this out based on the presence of other phenotypes, medical history, or biomarkers.

A really interesting example is "neonatal hypotonia", which can be an indicator of totally benign conditions all the way through severe ones with high mortality rates.

Screenshot 2023-03-31 at 11 11 21

bschilder commented 1 year ago

Next up:

[ ] Fix the Descartes CTD (https://github.com/neurogenomics/MultiEWCE/issues/14). If this version of the CTD was indeed used by @bobGSmith , I will also need to rerun all of the original celltype-phenotype enrichment analyses.
[ ] Rerun the celltype-symptom enrichment analyses using EWCE instead of the simple Fisher's exact tests
[ ] Rerun the differential clinical outcomes analysis described above.

bschilder commented 7 months ago

Now automated as follows:

results <- MSTExplorer::load_example_results()
results <- HPOExplorer::add_death(results,
                                  allow.cartesian = TRUE,
                                  agg_by = c("disease_id","hpo_id"))

## Count number of diseases associated with these phenotypes
keep_descendants <- "Hypotonia" # HP:0001252
hypotonia_results <- HPOExplorer::filter_descendants(results, 
                                                     keep_descendants = keep_descendants) 
hypotonia_results <- MSTExplorer::map_celltype(hypotonia_results)
phenotypes_ids <- unique(hypotonia_results$hpo_id) 
message(length(phenotypes_ids)," unique phenotypes are descendants of ",
        paste(keep_descendants,collapse = "; "))
message(length(unique(hypotonia_results$disease_id) ),
        " unique diseases are associated with: ",
        paste(keep_descendants,collapse = "; "))
message(length(unique(hypotonia_results[q<0.05]$cl_name) ),
        " unique cell types are associated with: ",
        paste(keep_descendants,collapse = "; "))

## Generate plot
differential_outcomes <- MSTExplorer::plot_differential_outcomes_heatmap( 
  results = hypotonia_results, 
  print_phenotypes = TRUE,
  fill_limits = c(1,8),
  save_path = here::here("figures/figure6.pdf"),
  height=8, 
  width=12
)
differential_outcomes$plot

neurogenomics / RareDiseasePrioritisation

Assess whether cell types predict clinical course #23