Closed AlanSimmons closed 6 months ago
It appears that some codes from HPO are not being translated correctly.
Example query:
match (t:Term)<-[]-(o:Code)<-[:CODE]-(c:Concept) where o.CodeID in ['HP:0001636','HPO:0001636'] return *
Result:
There is a code for HP:001636 and HPO:0001636, both corresponding to the Tetralogy of Fallot:
HPO codes from the UMLS are formatted with CodeIDs in format HPO HP:x. The codeReplacements function in the ETL maps these codes to HPO:x (https://github.com/x-atlas-consortia/ubkg-etl/blob/be30391e9b08c01770a28d0c10b4e489f88bba67/generation_framework/ubkg_utilities/ubkg_parsetools.py#L269C5-L269C5).
codeReplacements
One of the ingested ontologies must be formatting HPO nodes as HP:x.
Enhance codeReplacements to account for the alternate formatting of HP nodes.
Per discussion with Deanne, use HP as the standard SAB. Aligns with the EFO, MONDO ingests.
Statement of problem
It appears that some codes from HPO are not being translated correctly.
Example query:
match (t:Term)<-[]-(o:Code)<-[:CODE]-(c:Concept) where o.CodeID in ['HP:0001636','HPO:0001636'] return *
Result:
There is a code for HP:001636 and HPO:0001636, both corresponding to the Tetralogy of Fallot:
HPO codes from the UMLS are formatted with CodeIDs in format HPO HP:x. The
codeReplacements
function in the ETL maps these codes to HPO:x (https://github.com/x-atlas-consortia/ubkg-etl/blob/be30391e9b08c01770a28d0c10b4e489f88bba67/generation_framework/ubkg_utilities/ubkg_parsetools.py#L269C5-L269C5).One of the ingested ontologies must be formatting HPO nodes as HP:x.
Solution
Enhance
codeReplacements
to account for the alternate formatting of HP nodes.