snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Add ability to auto-discover cluster #45

Open jbeemster opened 6 years ago

jbeemster commented 6 years ago

This would be based on name and age where rather than passing the jobflow ID you would pass the jobflow name and dataflow-runner would select the newest cluster available that is running.

Thoughts @alexanderdean @BenFradet ?

alexanderdean commented 6 years ago

I really like this:

  1. It could allow us to avoid hardcoding jobflow IDs (which change as we replace clusters) into DAGs
  2. It allows us to do blue-green switchovers
chuwy commented 6 years ago

Just another possible way, probably easier to implement: https://github.com/snowplow/dataflow-runner/issues/35

jbeemster commented 6 years ago

That could work as well - could be an idea that you could also store a jobflow-id as a Consul KV pair to be checked also...