Closed NathanSkene closed 1 year ago
I think "FrequencyHPO" means something a bit different in this file:
g2p <- HPOExplorer::load_phenotype_to_genes("genes_to_phenotype.txt")
After translating this column from ID to names, it shows:
Can't find much documentation on this on HPO's site, but did find this: https://hpo-annotation-qc.readthedocs.io/en/latest/smallfile.html
- frequency. This column can be one of three formats: A valid HPO term from the frequency subontology, a fractional expression m/n (e.g., 4/7 meaning that 4 of 7 individuals in the cited study had the disease and the feature in question, while the feature was ruled out in the remaining 3 of 7 individuals); or a percentage value such as 47%. This column may be empty.
These frequencies appear to be gene-specific, as aggregating them by Phenotype shows multiple frequencies per Phenotype. In other words, I'm interpreting these frequencies as "how frequently is a mutation in this gene associated with this phenotype?"
So this is still useful for prioritising putative gene targets, as gene with mutations that occur is a larger % of the disease population will have a bigger impact (and are more financially feasible for pharma companies).
I've parsed this further to get frequency ranges.
I can also aggregate the gene frequencies to phenotype-level. Though not sure exactly what this would tell us. Perhaps something like, "% of time that any known genes are associated with the phenotype"
Another way to get phenotype prevalence is from the HPO annotations file:
annot <- load_phenotype_to_genes("phenotype.hpoa")
In general, this tells us how frequently a phenotype occurs within a cohort of individuals with a given disease. So if we compute the mean frequency per phenotype, it tells us "within all known diseases where this phenotype occurs, what is the average frequency of this phenotype?"
This gives us a roughly normal distribution of phenotype frequency within diseases.
I've stored the parsed phenotype frequencies as a built-in dataset to HPOExplorer
to save time: hpo_frequency
Also, I've added 2 new functions to add frequency-related info to a given dataframe of HPO phenotypes:
add_gene_frequency
: frequency of genes within a given phenotype.add_pheno_frequency
: frequency of a phenotype within diseases.
"Frequency" is a column in the phenotype-to-disease annotations file from HPO. https://hpo-annotation-qc.readthedocs.io/en/latest/annotationFormat.html
Here's some example values:
But this has more to do with how frequently each patient with a given disease also has the HPO phenotype. To get overall prevalence in the wider population, we'd have to gather data from another resource.