openedx-unsupported / edx-analytics-configuration

GNU Affero General Public License v3.0
8 stars 28 forks source link

Add retries to EMR provision/terminate tasks (and update ansible) #74

Closed pwnage101 closed 5 years ago

pwnage101 commented 5 years ago

This commit is in response to a large volume of failed jobs that failed due to these ansible tasks failing (usually due to transient aws issues) but succeeded after manually retrying.

In order to complete this task, we also needed to update ansible from 1.4.4 to 2.5+.

DE-1483

Testing

I tested by running the debug-pipeline-job: http://jenkins.analytics.edx.org/job/debug-pipeline-job/288/console

The job failed because the task doesn't exist (FooBarDoesntExist) but the job otherwise has full coverage of the code changes in this PR.

pwnage101 commented 5 years ago

Here are the docs that I read to help write this code:

retries was added in 1.4, and we run 1.4.4. Unfortunately, the solution I implemented initially which was pulled from the latter link supposedly only works on 2.5+. I need to rework this since we run such an old ansible. Force-pushes incoming...

brianhw commented 5 years ago

Still LGTM. 👍

pwnage101 commented 5 years ago

Confirmed that update-users still works: http://jenkins.analytics.edx.org/job/update-users/32/console

Merging.