m-lab / etl-gardener

Gardener provides services for maintaining and reprocessing mlab data.
Apache License 2.0
13 stars 5 forks source link

Improve communication of dataset/table between gardener and etl pipeline. #57

Open gfr10598 opened 6 years ago

gfr10598 commented 6 years ago

The gardener and etl pipeline both need to know which dataset/table the rows are being written to. Currently they each use their own logic or env vars to determine this, which is very fragile. Instead, either gardener should choose, and pipeline should respect that choice, or vice-versa.

ETL makes that choice on each task creation, so it is quite practical to include it either as part of the task (in the task queue), or ETL can reference datastore to get a map for that project from {batch/datatype} key to dataset/table value.

gfr10598 commented 6 years ago

Basically a duplicate of etl #519

gfr10598 commented 6 years ago

https://github.com/m-lab/etl/issues/519