One of the most important things for consistency with the rare disease analyses is knowing exactly which version of the HPO data we're using.
The caching feature within HPOExplorer is helpful in that it speeds things up, but it also means it's unclear which version of the data was cached.
The HPO ontology obj: get_hpo
As I now get the ontology from the official HPO GH releases, it already includes metadata about the precise release version.
This is accessible via attr(hpo,"version") or simply by printing the hpo object.
I've also added a new internal function make_hpo which constructs a new ontologyIndex object from the OBO file provided on the HPO GH Releases page.
> attr(hpo,"version")
[1] "format-version: 1.2"
[2] "data-version: hp/releases/2023-10-09/hp-base.owl"
[3] "subsetdef: hposlim_core \"Core clinical terminology\""
[4] "subsetdef: secondary_consequence \"Consequence of a disorder in another organ system.\""
[5] "synonymtypedef: abbreviation \"abbreviation\""
[6] "synonymtypedef: HP:0034334 \"allelic_requirement\""
[7] "synonymtypedef: layperson \"layperson term\""
[8] "synonymtypedef: obsolete_synonym \"discarded/obsoleted synonym\""
[9] "synonymtypedef: plural_form \"plural form\""
[10] "synonymtypedef: uk_spelling \"UK spelling\""
[11] "default-namespace: human_phenotype"
[12] "remark: Please see license of HPO at http://www.human-phenotype-ontology.org"
[13] "ontology: hp/hp-base"
[14] "property_value: http://purl.org/dc/elements/1.1/creator \"Human Phenotype Ontology Consortium\" xsd:string"
[15] "property_value: http://purl.org/dc/elements/1.1/creator \"Monarch Initiative\" xsd:string"
[16] "property_value: http://purl.org/dc/elements/1.1/creator \"Peter Robinson\" xsd:string"
[17] "property_value: http://purl.org/dc/elements/1.1/creator \"Sebastian Köhler\" xsd:string"
[18] "property_value: http://purl.org/dc/elements/1.1/description \"The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities and clinical features encountered in human disease.\" xsd:string"
[19] "property_value: http://purl.org/dc/elements/1.1/rights \"Peter Robinson, Sebastian Koehler, The Human Phenotype Ontology Consortium, and The Monarch Initiative\" xsd:string"
[20] "property_value: http://purl.org/dc/elements/1.1/subject \"Phenotypic abnormalities encountered in human disease\" xsd:string"
[21] "property_value: http://purl.org/dc/elements/1.1/title \"Human Phenotype Ontology\" xsd:string"
[22] "property_value: http://purl.org/dc/elements/1.1/type IAO:8000001"
[23] "property_value: http://purl.org/dc/terms/license https://hpo.jax.org/app/license"
[24] "property_value: IAO:0000700 HP:0000001"
[25] "property_value: owl:versionInfo \"2023-10-09\" xsd:string"
[26] "logical-definition-view-relation: has_part"
The HPO gene lists:load_phenotype_to_genes
This one was a little trickier. Since the data was provided as a csv, it doesn't store any attributes made to the R-based object after importing with data.table::fread. I could add it to the cached file name, or make another file logging the versions, or add an extra column where every row repeats the "version" character. But none of these seemed like great options.
Instead, i've added an extra step to load_phenotype_to_genes that imports the table, adds the "version" attributes, and then caches the obj as an RDS file. The next time the function is run, it will use the stored RDS file by default, which has the version accessible as attr(x,"verison").
Also, I added some code such that every time one of these objects is loaded into R, the version is printed to the console. Thi way, the user automatically always knows exactly which version of the data they're currently using.
One of the most important things for consistency with the rare disease analyses is knowing exactly which version of the HPO data we're using.
The caching feature within
HPOExplorer
is helpful in that it speeds things up, but it also means it's unclear which version of the data was cached.The HPO ontology obj:
get_hpo
attr(hpo,"version")
or simply by printing thehpo
object.make_hpo
which constructs a newontologyIndex
object from the OBO file provided on the HPO GH Releases page.The HPO gene lists:
load_phenotype_to_genes
data.table::fread
. I could add it to the cached file name, or make another file logging the versions, or add an extra column where every row repeats the "version" character. But none of these seemed like great options.load_phenotype_to_genes
that imports the table, adds the "version" attributes, and then caches the obj as an RDS file. The next time the function is run, it will use the stored RDS file by default, which has the version accessible asattr(x,"verison")
.Next steps
I plan to extend this approach to
MultiEWCE
for the functions that distributed results and prioritised gene therapy targets.