Closed bschilder closed 7 months ago
Strategy summary:
Find phenotypes associated w/ 20+ diseases & have metadata.
Run a series of linear models, one per disease-metadata variable combination.
Then test whether the EWCE enrichment p-values (conditional on celltype) significantly affect whether the phenotype affects clinical outcomes (e.g. age of death).
The formula for the linear model is as follows:
outcome ~ EWC_pvalue * celltype
Can you confirm how many phenotypes have the required data before proceeding with linear modelling? Not clear that we’d be able to use the EWCE pvalue as most the diseases will be single gene
Sent from Outlook for iOShttps://aka.ms/o0ukef
From: Brian M. Schilder @.> Sent: Friday, March 24, 2023 9:47:53 AM To: neurogenomics/RareDiseasePrioritisation @.> Cc: Subscribed @.***> Subject: Re: [neurogenomics/RareDiseasePrioritisation] Assess whether cell types predict clinical course (Issue #23)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
Strategy summary:
Find phenotypes associated w/ 20+ diseases & have metadata.
Run a series of linear models, one per disease-metadata variable combination.
Then test whether the EWCE enrichment p-values (conditional on celltype) significantly affect whether the phenotype affects clinical outcomes (e.g. age of death).
The formula for the linear model is as follows:
outcome ~ EWC_pvalue * celltype
— Reply to this email directly, view it on GitHubhttps://github.com/neurogenomics/RareDiseasePrioritisation/issues/23#issuecomment-1482524104, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPEYLFI2ISTITNOAQW3LW5VUUTANCNFSM6AAAAAAWGJ34T4. You are receiving this because you are subscribed to this thread.Message ID: @.***>
already done, just about to summarise
Can you confirm how many phenotypes have the required data before proceeding with linear modelling? Not clear that we’d be able to use the EWCE pvalue as most the diseases will be single gene
Not really sure what you mean by this. This is not a gene-level test. Are you thinking of the other thread? #18
But here is the answer to the numbers question:
The summary is that it seems to have worked, all clinical outcomes were significantly modified by cell type identity in the vast majority of phenotype tests. See the Rmarkdown report above for details
This summarises the findings visually, and you can see the distribution of clinical courses shifting leftward as you look at test results with an increasingly stringent pvalue threshold (moving from 1. to 6. in each facet).
We're able to visualise the metadata this way because I codified all values for each variable to ordinal scales (eg age of onset, on a score from 1-11). https://neurogenomics.github.io/HPOExplorer/reference/add_onset.html
Realised it didn't quite make sense to use the phenotype-level p-values as this does not vary across diseases (within a phenotype). I think this might be what @NathanSkene was trying to say in our last meeting, but my brain was a bit fried at the time so it didn't sink in. In those set of analyses, I believe the reason we were seeing significant clinical outcome modifying cell types was an artifact of how i had merge the pre-filtered symptom-level celltype enrichment results.
That said, I've gone back and redone the analyses using the symptom-level celltype enrichment p-values, and we actually see far more significant results in which celltype was a significant modifier of clinical outcomes (across all clinical variables).
Here are the top most significant results for each clinical variable. In these plots, the numerically-encoded clinical variable is plotted on the y-axis, and the celltypes that each disease are significantly enriched for (within a phenotype) are on the x-axis.
These can be interpreted as examples of phenotypes where the clinical outcomes are variable, and are dependent on which disease (and thus celltype-specific mechanism) is actually causing them. The exact disease is not always known, and typically clinicians have to try and figure this out based on the presence of other phenotypes, medical history, or biomarkers.
A really interesting example is "neonatal hypotonia", which can be an indicator of totally benign conditions all the way through severe ones with high mortality rates.
Next up:
Now automated as follows:
results <- MSTExplorer::load_example_results()
results <- HPOExplorer::add_death(results,
allow.cartesian = TRUE,
agg_by = c("disease_id","hpo_id"))
## Count number of diseases associated with these phenotypes
keep_descendants <- "Hypotonia" # HP:0001252
hypotonia_results <- HPOExplorer::filter_descendants(results,
keep_descendants = keep_descendants)
hypotonia_results <- MSTExplorer::map_celltype(hypotonia_results)
phenotypes_ids <- unique(hypotonia_results$hpo_id)
message(length(phenotypes_ids)," unique phenotypes are descendants of ",
paste(keep_descendants,collapse = "; "))
message(length(unique(hypotonia_results$disease_id) ),
" unique diseases are associated with: ",
paste(keep_descendants,collapse = "; "))
message(length(unique(hypotonia_results[q<0.05]$cl_name) ),
" unique cell types are associated with: ",
paste(keep_descendants,collapse = "; "))
## Generate plot
differential_outcomes <- MSTExplorer::plot_differential_outcomes_heatmap(
results = hypotonia_results,
print_phenotypes = TRUE,
fill_limits = c(1,8),
save_path = here::here("figures/figure6.pdf"),
height=8,
width=12
)
differential_outcomes$plot
Following up from here: https://github.com/neurogenomics/RareDiseasePrioritisation/issues/20#issuecomment-1481631792