opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

ETL ontology-inputs - EFO data model refactoring proposal #3061

Open mbdebian opened 1 year ago

mbdebian commented 1 year ago

Discussion issue

Input files for disease ETL step are located at ontology-inputs folder.

When having a look at the file ontology-efo.jsonl, collected by PIS from here, according to PIS configuration file, I found the following data model:

root
 |-- code: string (nullable = true)
 |-- dbXRefs: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- definition: string (nullable = true)
 |-- definition_alternatives: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- id: string (nullable = true)
 |-- isTherapeuticArea: boolean (nullable = true)
 |-- label: string (nullable = true)
 |-- locationIds: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- obsoleteTerms: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- parents: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- synonyms: struct (nullable = true)
 |    |-- hasBroadSynonym: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- hasExactSynonym: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- hasNarrowSynonym: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- hasRelatedSynonym: array (nullable = true)
 |    |    |-- element: string (containsNull = true)

Where the synonyms attribute is a structure of 4 arrays: hasBroadSynonym, hasExactSynonym, hasNarrowSynonym and hasRelatedSynonym

These 4 attributes are of array type, not booleans, so I was wondering how feasible would be to promote a data model refactoring where:

mbdebian commented 1 year ago

@prashantuniyal02 , do we have any existing label for those issues that are about proposals or meant to start a conversation?

prashantuniyal02 commented 1 year ago

No, nothing specific. Because most issues are discussed internally first and then issues are created for specific tasks. I think if there are enough issues in this category, we can create a label for proposals. As of now, most of issues are fit in Enhancement

ireneisdoomed commented 1 year ago

@mbdebian I think these are very valid suggestions, especially their propagation into the final diseases output (described in #3069). Where are you thinking to apply these changes? PIS at the moment is focused at data collection, more than data transformation. Are you suggesting to transformontology-efo.jsonl to a nicer format?