x-atlas-consortia / ubkg-etl

A framework that combines data from the UMLS with assertions from other data sources into a set of CSV files that can be imported into neo4j to build a Unified Biomedical Knowledge Graph (UBKG)
MIT License
3 stars 0 forks source link

Resolve issues with HPO codes #122

Closed AlanSimmons closed 6 months ago

AlanSimmons commented 9 months ago

Statement of problem

It appears that some codes from HPO are not being translated correctly.

Example query:

match (t:Term)<-[]-(o:Code)<-[:CODE]-(c:Concept) where o.CodeID in ['HP:0001636','HPO:0001636'] return *

Result:

There is a code for HP:001636 and HPO:0001636, both corresponding to the Tetralogy of Fallot:

image

HPO codes from the UMLS are formatted with CodeIDs in format HPO HP:x. The codeReplacements function in the ETL maps these codes to HPO:x (https://github.com/x-atlas-consortia/ubkg-etl/blob/be30391e9b08c01770a28d0c10b4e489f88bba67/generation_framework/ubkg_utilities/ubkg_parsetools.py#L269C5-L269C5).

One of the ingested ontologies must be formatting HPO nodes as HP:x.

Solution

Enhance codeReplacements to account for the alternate formatting of HP nodes.

AlanSimmons commented 9 months ago

Per discussion with Deanne, use HP as the standard SAB. Aligns with the EFO, MONDO ingests.