Open d0choa opened 1 year ago
The PIS ingest human phenotype ontology from obolibrary and structures it without much transformation.
I would suggest to add a resolveDiseases step in the disease object generation, similar when ingesting disease/target evidence, to check if the provided disease identifier is an obsoleted id for an existing term. I know, I'm proposing to check against a dataset that is just being generated, so that's adds a bit of complexity.
I'm wondering if such logic could be abstracted so a "validation" step would be executed any time a disease is ingested by the ETL (regardless what data type of source it provides). It would prevent introducing discrepancies.
@JarrodBaker could you help us scope the task? would it require much work in the ETL?
@d0choa , I wonder whether we can close this issue, as it's related to a release from last year.
I'll close it, but if you think it's still relevant, please, feel free to re-open it.
This is newly opened bug report is related to this issue: https://github.com/opentargets/issues/issues/2929
Practically, etl needs to resolve disease in all datasets where we get disease information not only evidence.
@remo87 , would you mind getting together with @tskir to find out whether this is still relevant and what are the next steps? Thanks!
As reported in a recent community post, some diseases that used to have
disease2phenotype
relationships (ETL code) don't have it anymore.In the
hpo-phenotypes.jsonl
22.09 input file we can find entries linked to an obsoleted ID ("ORPHA:217607") that the user has reported.Because the
ORPHA:217607
ID has been obsoleted in EFO, my impression is that we are dropping all the records.Using the disease index we can rescue the obsoleted IDs in the same way that we rescue them for the purpose of evidence. An example of how the relevant ID is found in our disease index linked to
MONDO_0016333
(with the annoying ORPHA == Orphanet conversion).Half-baked feature, bug or enhancement, depending on how you see it ;)