opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

PIS transformation extraction: `expression` step #3506

Closed javfg closed 5 days ago

javfg commented 2 months ago

Description

The expression step downloads six files, but in two of them it also applies some transformations.

Transformations PIS was doing

https://www.proteinatlas.org/download/normal_tissue.tsv.zip

This file is just unzipped.

https://raw.githubusercontent.com/opentargets/expression_hierarchy/master/process/map_with_efos.json

This file is converted to JSONL. The lines come from an object (tissues), not an array, so it is adding a tissue_id field to each entry, which is the field name for that entry. This is easier to see with an example:

{
  "tissues": {
    "Brodmann (1909) area 24": {
      "anatomical_systems": [
        "nervous system"
      ],
      "efo_code": "UBERON_0006101",
      "label": "Brodmann (1909) area 24",
      "organs": [
        "brain"
      ]
    }
  },
}

that first object inside tissues becomes the line:

{"anatomical_systems": ["nervous system"], "efo_code": "UBERON_0006101", "label": "Brodmann (1909) area 24", "organs": ["brain"], "tissue_id": "Brodmann (1909) area 24"}

Notice the "tissue_id": "Brodmann (1909) area 24"` field in the end, which is the name of the object in the original JSON.

Tasks