Open kriberg opened 2 years ago
Hello, Thanks for the report. This looks like a design flaw or a bug in dyrygent.
I'll take a closer look into it once I have a spare moment.
So this issue is a limitation of current version of dyrygent.
When chain on_error
is defined as follows
chain1.on_error(callback.si(f"Leaf chain 1 failed"))
Celery does not immediately attach on_error
to all tasks within the chain. It does that when chain1.apply_async()
is executed.
Now when the chain is consumed by dyrygent
wf.add_celery_canvas(chain1)
It is immediately dismantled into primitive tasks (signatures
). Since the signatures do not yet have the on_error
attached the information is lost.
This could potentially be fixed by attaching on_error
to each signature
when add_celery_canvas
is executed. However I'm not yet sure if this will always work consistently.
It should be simple when you want to have on_error
on a simple chain:
chain = A-> B -> C
A, B, C - simple tasks
But when we consider more complex example:
chain = A -> B -> C
A - group
B - chain
C - chord
A1 C1
/ \ / \
A-A2-B-B1-B2-C-C2-C3
\ /
A3
In this case doing chain.on_error
would have to attach on_error
to all tasks (A1, A2, A3, B1, B2, C1, C2, C3)
I think you could try to workaround this limitation by doing:
chain1 = chain(normal_task.si(), normal_task.si(), failing_task.si())
for task in chain1.tasks:
task.on_error(callback.si(f"Leaf chain 1 failed"))
I switched to
chain1 = chain(
normal_task.si(),
normal_task.si(),
failing_task.si(),
)
for task in chain1.tasks:
task.on_error(callback.si("Leaf chain 1 failed"))
wf = Workflow()
wf.set_retry_policy("random", 1, 3)
wf.add_celery_canvas(chain1)
result = wf.apply_async(options={"link_error": callback.si("wf error")})
result.get()
but it doesn't make any difference. I also tried:
chain1 = chain(
normal_task.si().on_error(callback.si("Leaf chain 1 failed")),
normal_task.si().on_error(callback.si("Leaf chain 1 failed")),
failing_task.si().on_error(callback.si("Leaf chain 1 failed")),
).on_error(callback.si("Leaf chain 1 failed"))
Still same behaviour
I've slightly modified your last piece of code.
chain1 = chain(
normal_task.si(),
normal_task.si(),
failing_task.si(),
)
for task in chain1.tasks:
task.on_error(callback.si("Leaf chain 1 failed"))
wf = Workflow()
wf.set_retry_policy("random", 1, 3)
wf.add_celery_canvas(chain1)
# result = wf.apply_async(options={"link_error": callback.si("wf error")})
result = wf.apply_async()
result.get()
Now it seems to be working as desired.
This most certainly needs further investigation.
I'm having some issues with triggering of the on_error task callbacks from chains. I've created a test project here: https://github.com/kriberg/dyrygent-test
This defines three tasks:
These are put into a chain:
Calling this with celery-dyrygent:
This produces the following log:
Here we see the callback linked to the overall workflow triggers as intended, but the callback set to the chain never fires.
Calling the same chain with celery apply_async:
Produces this:
Here both callbacks are triggered correctly. Now, as we know, celery doesn't do well with a large complex canvas, so just using celery isn't a good option.
Is this a limitation with dyrygent or is it a bug?