Automation for Platform ETL

As a user I want to be able to run the ETL pipeline easily because it currently involves writing configs, building jars and copying files to GCS before hitting run.

Background

The current process is:

checkout a branch for the release
create a config for the etl
copy the config to GCS
build the etl jar
copy the etl jar to GCS
create the config for the workflow jar
build the workflow jar
run the workflow

These are all manually done each run. The main input variables we need to be able to control are:

platform/data release version e.g. 23.12
chembl version e.g. 33
ensembl version e.g. 110
is public - boolean
datasources to exclude

Tasks

[x] makefile to run the workflow.
[x] profile like PIS/POS for capturing input variables

Acceptance tests

How do we know the task is complete?

I can run the ETL with a single command.

opentargets / issues

Automation for Platform ETL #3216

Background

Tasks

Acceptance tests