snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Load configuration files eagerly during run-transient #47

Closed jbeemster closed 5 years ago

jbeemster commented 6 years ago

If the emr_cluster config path is valid and the cluster launches but the playbook path is not available then the command exits - leaving the cluster running for no reason.

Either we need to eagerly evaluate all assets (emr config and playbook config) or we need to kill the EMR cluster on playbook resolution failure.

https://github.com/snowplow/dataflow-runner/blob/master/src/main.go#L236

cc/ @alexanderdean

jbeemster commented 6 years ago

Actually sorry the above is not quite right - the issue is that the config is never loaded into memory for either. So if the config is removed mid run then dataflow-runner cannot perform playbook or "down" function.

It would be good to eagerly evaluate and load the assets into memory at start to avoid this potential issue.

alexanderdean commented 6 years ago

Ouch, good catch!

jbeemster commented 5 years ago

@BenFradet do you have any idea on when this will be mile-stoned to be fixed?

BenFradet commented 5 years ago

This should be in the sprint starting nov 27th :+1: