severb / flowy

Python library for Amazon Simple Workflow Service
http://flowy.rtfd.org/
MIT License
33 stars 10 forks source link

Error handling suggestion? #17

Closed delbinwen closed 9 years ago

delbinwen commented 9 years ago

Hi,

We're trying AWS SWF and the package, flowy, you created is a big help. It made us easy to write our own logic and validate it. Good job!

Look at the source code, you intend to let developer to handle activity error/timeout; it's okay. We just want to know do you have any good suggestion to handle activity error with flowy, especially for activities running on parallel? For example, we have (same) 10 activities (with different input) running and we want to re-run failed ones if any.

I apologize if this is not the right place to get some help.

Thanks, Wesley

severb commented 9 years ago

Hey,

If you want to retry activities on time-out the easiest way to do it is by setting a retry argument in the activity proxy. By default it has a value of 3, meaning that in case of time-out it will reschedule the activity 3 times before it fails.

If you want to retry on activity errors, you need to use a sub-workflow like in this example:


@swf_activity(version=0, ...)
class CanFailActivity(SWFActivity):
    def run(self):
        from random import choice
        if choice([0, 1]):
            raise RuntimeError('Err!')
        return 10

@swf_workflow(...)
class RunUntilItWorksWorkflow(SWFWorkflow):

    can_fail = SWFActivityProxy(name='CanFailActivity', version=0, error_handling=True, ...)

    def run(self):
        r = self.can_fail()
        while 1:
            try:
                return r.result()  # try to get the result, retry if it failed
            except TaskError:
                r = self.can_fail()

@swf_workflow(version=0, ...)
class MyWorkflow(SWFWorkflow):

    run_until_it_works = SWFWorkflowProxy(name='RunUntilItWorksWorkflow', version=0, ...)

    def run(self):
        for _ in range(10):
            self.run_until_it_works()  # this will schedule 10 things in parallel

If you want, you can also catch TimeoutError which is a subclass of TaskError.

In the development version (which is not stable yet) you can implement this pattern in a different way, without sub-workflows.

delbinwen commented 9 years ago

Thank you! We implemented similar approach in our code and it worked.