Under the current architecture there is a lot of reduncancy in the step configuration. We end up writing a lot of boilerplate code; and once we define the configuration variables, we then keep passing them around:
config/datasets/gcp.yaml, where the actual configuration is
config/step/my_finngen.yaml, just passing through
src/otg/config.py, just passing through
Register config
Add entry to the DAG definition
The result is that every tiny action, such as adding or renaming a parameter, cascades into lots of changes all over the code. This is very distracting, creates friction, and slows down development. As the number of steps is expected to grow, especially the data ingestion steps, this has to be addressed.
Under the current architecture there is a lot of reduncancy in the step configuration. We end up writing a lot of boilerplate code; and once we define the configuration variables, we then keep passing them around:
config/datasets/gcp.yaml
, where the actual configuration isconfig/step/my_finngen.yaml
, just passing throughsrc/otg/config.py
, just passing throughRegister config
Add entry to the DAG definition
The result is that every tiny action, such as adding or renaming a parameter, cascades into lots of changes all over the code. This is very distracting, creates friction, and slows down development. As the number of steps is expected to grow, especially the data ingestion steps, this has to be addressed.