missyQWQ / Rare-Disease-Web-Portal

My MSc dissertation.
0 stars 1 forks source link

Provide driver genes #1

Open bschilder opened 5 hours ago

bschilder commented 5 hours ago

Priority: high

@bschilder will provide lists of genes driving the association between each significant phenotype-cell type association (ie genes in both lists with the highest specificity quantile at some consistent threshold). It seems the lists provided by Nathan previously were incorrect, and instead were simply the first N genes sorted alphabetically.

bschilder commented 4 hours ago

Here's how I gathered the driver genes. You can adjust the specificity quantiles to include (set to include only genes in to top 1/4 of specificity quantiles here, ie quantiles 30-40).

The choice of the quantiles threshold is totally arbitrary, so let me know if you'd like me to adjust it if needed.

I also added the continuous specificity score (from 0-1) in case that's helpful.

results = MSTExplorer::load_example_results()[q<0.05]
results <- HPOExplorer::add_disease(results, allow.cartesian=TRUE)
## Add specificity quantiles
drivers <- MSTExplorer:::add_driver_genes(results = results, 
                                          keep_quantiles = seq(30,40))
## Add continuous specificity as well
drivers <- MSTExplorer:::add_driver_genes(results = drivers, 
                                           metric = "specificity")
data.table::fwrite(drivers[,list(ctd,CellType,hpo_id,gene_symbol,specificity_quantile,specificity)]|>unique(),
                   "Downloads/drivers.csv.gz")

Resulting table attached here:

drivers.csv.gz

Also, in case it's helpful here's some metrics for assessing how many driver genes per phenotype-cell type association there are.

hist(drivers$n_driver_genes_hpo_id)
tail(sort(table(drivers$gene_symbol)))

image