opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Missing disease locations in latest platform release #2548

Open d0choa opened 2 years ago

d0choa commented 2 years ago

The disease locations disappeared for the majority of diseases in the last Platform version. This was raised by the contextual networks OTAR project trying to access this information.

Data that was present in previous releases is no longer available. This data is not queried by the FE, so no test is expected to fail.

Example for liver disease (EFO_0001421) returning liver (UBERON_0002107) as directLocationIds.

~/Datasets 9s
(base) base ❯ gsutil cat 'gs://open-targets-data-releases/21.06/output/etl/json/diseases/*.json' | jq -r 'select(.id == "EFO_0001421") | [.id, .directLocationIds]|@json'

["EFO_0001421",["UBERON_0002107"]]

~/Datasets 11s
(base) base ❯ gsutil cat 'gs://open-targets-data-releases/21.09/output/etl/json/diseases/*.json' | jq -r 'select(.id == "EFO_0001421") | [.id, .directLocationIds]|@json'

["EFO_0001421",["UBERON_0002107"]]

~/Datasets 11s
(base) base ❯ gsutil cat 'gs://open-targets-data-releases/21.11/output/etl/json/diseases/*.json' | jq -r 'select(.id == "EFO_0001421") | [.id, .directLocationIds]|@json'

["EFO_0001421",["UBERON_0002107"]]

~/Datasets 11s
(base) base ❯ gsutil cat 'gs://open-targets-data-releases/22.02/output/etl/json/diseases/*.json' | jq -r 'select(.id == "EFO_0001421") | [.id, .directLocationIds]|@json'

["EFO_0001421",null]

Same behaviour using API:

query diseaseAnnotation {
  disease(efoId: "EFO_0001421") {
    id
    name
    directLocations{
      id
      name
    }
  }
}
{
  "data": {
    "disease": {
      "id": "EFO_0001421",
      "name": "liver disease",
      "directLocations": []
    }
  }
}

I have confirmed that the disease location is still there in EFO [here] but it seems not to be there in the input support output.

(base) base ❯ gsutil cat 'gs://open-targets-data-releases/21.11/input/ontology-inputs/ontology-efo-v3.35.0.jsonl' | jq 'select(.id == "EFO_0001421") | [.id, .locationIds]'
[
  "EFO_0001421",
  [
    "UBERON_0002107"
  ]
]

~/Datasets 7s
(base) base ❯ gsutil cat 'gs://open-targets-data-releases/22.02/input/ontology-inputs/ontology-efo-v3.38.0.jsonl' | jq 'select(.id == "EFO_0001421") | [.id, .locationIds]'
[
  "EFO_0001421",
  null
]

I think everything is pointing to the riot script requiring an update due to changes in EFO.

mbdebian commented 2 years ago

As a work in progress... This is the difference, in EFO, between v35 and versions 38 and 40 (these last two are the same)

{
  "@id": "_:b12001",
  "@type": "Restriction",
  "onProperty": "efo:EFO_0000784",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b13200",
  "@type": "Class",
  "unionOf": {
    "@list": ["obo:UBERON_0002107", "_:b50330"]
  },
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b16453",
  "@type": "Restriction",
  "onProperty": "obo:BFO_0000050",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b21157",
  "@type": "Restriction",
  "onProperty": "obo:BFO_0000050",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b22366",
  "@type": "Restriction",
  "onProperty": "obo:BFO_0000050",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b22791",
  "@type": "Restriction",
  "onProperty": "efo:EFO_0000784",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b37342",
  "@type": "Restriction",
  "onProperty": "efo:EFO_0000784",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b44714",
  "@type": "Restriction",
  "onProperty": "efo:EFO_0000784",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b50330",
  "@type": "Restriction",
  "onProperty": "obo:BFO_0000050",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b50823",
  "@type": "Restriction",
  "onProperty": "efo:EFO_0000784",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "_:b61685",
  "@type": "Restriction",
  "onProperty": "obo:BFO_0000050",
  "someValuesFrom": "obo:UBERON_0002107",
  "subClassOf": [],
  "hasExactSynonym": [],
  "hasDbXref": [],
  "inSubset": [],
  "hasAlternativeId": []
} {
  "@id": "obo:UBERON_0002107",
  "@type": "Class",
  "IAO_0000115": "An exocrine gland which secretes bile and functions in metabolism of protein and carbohydrate and fat, synthesizes substances involved in the clotting of the blood, synthesizes vitamin A, detoxifies poisonous substances, stores glycogen, and breaks down worn-out erythrocytes[GO].",
  "UBPROP_0000001": "Organ which secretes bile and participates in formation of certain blood proteins.[AAO]",
  "UBPROP_0000002": "relationship type change: differentiates_from endoderm (AAO:0000139) CHANGED TO: develops_from endoderm (UBERON:0000925)[AAO]",
  "UBPROP_0000003": "All vertebrates possess a liver (reference 1); Later in craniate evolution, an anterior gill arch was transformed into jaws, and many new types of feeding subsequently evolved.(...) A liver evolved that, among its many functions, stores considerable energy as glycogen or lipid (reference 2).[well established][VHOG]",
  "UBPROP_0000007": "hepatic",
  "UBPROP_0000008": ["An organ sometimes referred to as a liver is found associated with the digestive tract of the primitive chordate Amphioxus. However, this is an enzyme secreting gland, not a metabolic organ, and it is unclear how truly homologous it is to the vertebrate liver. The zebrafish liver differs from the mammalian liver in that the hepatocytes are not clearly organized in cords or lobules and the typical portal triads are not apparent. In addition, the zebrafish liver does not have Kuppfer cells. Furthermore, a clear distinction can be made between the male and female liver in the adult zebrafish. The female hepatocytes are very basophilic (Figure 15c) as a result of the production of vitellogenin (Van der Ven et al. 2003).", "The liver is found in all vertebrates, and is typically the largest visceral organ. Its form varies considerably in different species, and is largely determined by the shape and arrangement of the surrounding organs. Nonetheless, in most species it is divided into right and left lobes; exceptions to this general rule include snakes, where the shape of the body necessitates a simple cigar-like form. The internal structure of the liver is broadly similar in all vertebrates."],
  "UBPROP_0000009": "secretes bile and functions in metabolism of protein and carbohydrate and fat, synthesizes substances involved in the clotting of the blood, synthesizes vitamin A, detoxifies poisonous substances, stores glycogen, and breaks down worn-out erythrocytes[GO].",
  "UBPROP_0000012": "Only ZFA considers this part_of immune system - we weaken this to an overlaps relation, as in general it's only a subset of cells that have clear immune function.",
  "hasDbXref": ["GAID:288", "ZFA:0000123", "EFO:0000887", "MAT:0000097", "AAO:0010111", "VHOG:0000257", "MESH:D008099", "EMAPA:16846", "OpenCyc:Mx4rvVimppwpEbGdrcN5Y29ycA", "BTO:0000759", "MIAA:0000097", "UMLS:C0023884", "XAO:0000133", "EHDAA:2197", "galen:Liver", "SCTID:181268008", "EHDAA2:0000997", "Wikipedia:Liver", "TAO:0000123", "CALOHA:TS-0564", "EV:0100089", "FMA:7197", "NCIT:C12392", "MA:0000358"],
  "hasOBONamespace": "uberon",
  "hasRelatedSynonym": ["iecur", "jecur"],
  "id": "UBERON:0002107",
  "inSubset": ["obo:uberon/core#efo_slim", "obo:uberon/core#major_organ", "obo:uberon/core#vertebrate_core", "obo:uberon/core#organ_slim", "obo:uberon/core#uberon_slim", "obo:uberon/core#pheno_slim"],
  "label": "liver",
  "subClassOf": ["obo:UBERON_0002368", "_:b81782", "_:b28880"],
  "depicted_by": "Leber:Schaf.jpg",
  "hasExactSynonym": [],
  "hasAlternativeId": []
}

That chunk of information is missing in efo38 and efo40.

I need to dig deeper to find out whether this is a removal at the ontology level, or a hierarchy change that affects the results from the jq filtering we run on the converted file from owl to json.

d0choa commented 2 years ago

As flagged in https://github.com/EBISPOT/efo/issues/1505, this is the consequence of changes in the EFO slim

mbdebian commented 12 months ago

This is still the case in our recent June (2023) release. Have we decided how we are going to address this?

d0choa commented 11 months ago

According to this, we should have the data available. Can we have a look if we can process the data (e.g. using riot)?