snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Long playbook (9 steps+) run-transient mode issue #56

Open grzegorzewald opened 4 years ago

grzegorzewald commented 4 years ago

I have observed (replicable issue) while running dataflow runner in run-transient with playbooks longer than 8 steps. The error one may observe is 400: Throughput exceeded issue. Once the error occurs, the next step in run-transient is perform, namely down. After investigation i found that the issue has source in construction of run procedure - each step being submitted has state listener added.

As for number of trials, I had 100% success with running playbooks 8 stapes and shorter and not more than 5% for playbooks having 9 steps and more.

I would suppose that checking only state of current and next step (or number of steps) could potentially resolve the issue.

As per snowplow forums, the issue may be observed also in run mode unless async mode is enabled. This confirms theory drawn above.

This can potentially fix #37