opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Data downloads page for new version of Platform #1411

Closed andrewhercules closed 3 years ago

andrewhercules commented 3 years ago

Currently, the classic (Angular) version of the Platform allows users to download the evidence, association, target list, disease list, safety, baseline expression, and tractability data files.

The ETL pipelines for the new (React) version of the Platform allow us to increase the number of files that we make available for download. These pipelines generate the following files:

gs://open-targets-data-releases/21.02/output/ETL/aotf/clickhouse/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/aotf/elasticsearch/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/direct/byDatasource/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/direct/byDatatype/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/direct/byOverall/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/indirect/byDatasource/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/indirect/byDatatype/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/associations/indirect/byOverall/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/cancerBiomarkers/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/disease_hpo/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/diseases/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/drugs/drug/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/drugs/indication/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/drugs/mechanism_of_action/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/eco/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/evidences/failed/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/evidences/stats/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/evidences/succeeded/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/expression/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/hpo/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/interactions/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/interactions_evidence/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/knownDrugs/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/mousePhenotypes/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/openfda-faers/agg_by_chembl/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/openfda-faers/agg_by_chembl_parquet/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/openfda-faers/agg_critval_drug-parquet/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/openfda-faers/agg_critval_drug/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/openfda-faers/agg_critval_drug_csv/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/reactome/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/search/diseaseIndex/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/search/drugIndex/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/search/targetIndex/_SUCCESS
gs://open-targets-data-releases/21.02/output/ETL/targets/_SUCCESS

To optimise and streamline the maintenance of the data downloads page (https://beta.targetvalidation.org/downloads), the back-end team will produce a static JSON file with a list of datasets available for download. This list will include:

An example of the proposed JSON is below:

{
  "data": [
   {
     "version": "21.02",
     "datasets": [
       {
         "path": "/drug/indication",
         "format": "json",
         "directory_size": "500mb"
       },
       {
        "path": "/drug/indication",
        "format": "parquet",
        "directory_size": "534mb"
      },
      {
        "path": "/evidences/succeeded/sourceId=chembl",
        "format": "json",
        "directory_size": "52mb",
      }
     ]
   }
  ]
}

The front-end will access the file either through Google Cloud or the GraphQL API, retrieve the list of files available for download, and display the list on the /downloads page. As part of this process, the front-end will also be responsible for:

After discussions amongst the team, a few points were were noted for further discussion:

To do:

mkarmona commented 3 years ago

explore idea about generating schema and dataset description within the step

andrewhercules commented 3 years ago

Epic closed as initial implementation of data downloads page available. Further work (e.g. exposing new datasets, including links to BigQuery and GraphQL) will be captured in subsequent tickets.