snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Add option to fetch logs of failing jobflow step #40

Closed alexanderdean closed 6 years ago

alexanderdean commented 6 years ago

If a jobflow step fails, it would be nice to retrieve the logs for that step and write them out to Dataflow Runner's own stdout/err.

This prevents the operator having to connect to EMR and fetch them themself.

chuwy commented 6 years ago

This can have quite different behaviors for spark submit, plain custom jar and other ways to run EMR jobs. But if we assume that this feature is just for CUSTOM_JAR - we'll be able to use it with RDB Loader to retrieve stdout logs and drop approach with logkey.

alexanderdean commented 6 years ago

This is the key feature which is driving the 0.4.0 release (other features can potentially be pushed back @BenFradet)

BenFradet commented 6 years ago

@alexanderdean shouldn't this be the default?

alexanderdean commented 6 years ago

I think best to leave it as an option, in case the operator doesn't want the orchestration box retrieving the log files and printing them out...