opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Molecule downloads file schema is not right #3180

Closed DSuveges closed 3 months ago

DSuveges commented 6 months ago

Describe the bug

The dataset schema on the Platform download page wrong for molecules.

Observed behaviour

├───crossReferences : [object Object]

This is probably the because cross references in the parquet dataset is a map type.

When looking at the schema of the parquet dataset:

 |-- crossReferences: map (nullable = true)
 |    |-- key: string
 |    |-- value: array (valueContainsNull = true)
 |    |    |-- element: string (containsNull = true)

When looking at the schema of the json dataset:

 |-- crossReferences: struct (nullable = true)
 |    |-- DailyMed: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- DrugCentral: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- PubChem: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- TG-GATEs: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- Wikipedia: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- chEBI: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- drugbank: array (nullable = true)
 |    |    |-- element: string (containsNull = true)

To resolve the front-end issue, this object needs to be handled:

    {
      "name": "crossReferences",
      "type": {
        "type": "map",
        "keyType": "string",
        "valueType": {
          "type": "array",
          "elementType": "string",
          "containsNull": true
        },
        "valueContainsNull": true
      },
      "nullable": true,
      "metadata": {}
    }