workfloworchestrator / orchestrator-core

The workflow orchestrator core repository
Apache License 2.0
45 stars 15 forks source link

[Bug]: Stuck processes in Celery execution environment #687

Closed dmarjenburgh closed 4 months ago

dmarjenburgh commented 5 months ago

What happened?

Description

Recently we have seen process that were stuck in status CREATED or in status RESUMED. Also we have seen tasks in flower that were hanging on status STARTED but we could not find the related process id in the orchestrator.

Presumably this happens because the process is updated in the db first (to the CREATED/RESUMED state) and the celery task is triggered afterwards. If this fails for some reason, the process will be stuck as no worker will pick it up. (This cause is still conjecture at the time of writing this report).

Possible solution(s)

There are multiple improvements we can make to the current flow.

Version

1.3.0

What python version are you seeing the problem on?

All