snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Release/0.6.0 #69

Closed oguzhanunlu closed 2 years ago

oguzhanunlu commented 3 years ago

Instead of a new backoff strategy, we'll submit steps as part of EMR cluster creation

oguzhanunlu commented 3 years ago

todo: use this strategy only if a rate limit exception is captured in err

chuwy commented 3 years ago

todo: use this strategy only if a rate limit exception is captured in err

Actually, sorry. To be clear - this reverse backoff period is for normally functioning clusters. It's fairly unrelated to the initial issue we had.

Once the throughput exceptions is thrown - we should not decrease it... maybe even increase back.

Also, make sure we catch only those errors, nothing else should push back the failure.

  1. Reverse backoff is to start cluster sooner. Very small improvement
  2. Catching throughput is overcome the original problem
oguzhanunlu commented 3 years ago

new logic is not tested yet

oguzhanunlu commented 3 years ago

Thanks @chuwy , of course, I'll move it

chuwy commented 3 years ago

Hey @oguzhanunlu! Could you also take care of #70

paulboocock commented 3 years ago

Could we also update the README to fix the variety of broken links (to point to docs.snowplowanalytics.com) and images with this release?

Also the line "Starting from 0.5.1 the binary can be downloaded directly from Github releases." is no longer true, as we ported all releases from Bintray into Github Actions. I'm not sure thats really a "Quickstart" either, I feel like a quickstart should at least include a command or some instructions on how to run it (although I'm open to removing this section entirely and simply pointing to the docs).

oguzhanunlu commented 3 years ago

sure, I'll address both feedback