When the saq process doesn't exit cleanly, the current active jobs got stuck in active state and never got retried after saq restarted.
I believe the heartbeat property is not designed for this scenario, the sweep job aborts the job with heartbeat timeout. But for this scene, the job should be retried.
I suggest the job should record it's worker ID, and if the sweep finds that worker is not available anymore, the job should get retried.
When the saq process doesn't exit cleanly, the current active jobs got stuck in active state and never got retried after saq restarted.
I believe the heartbeat property is not designed for this scenario, the sweep job aborts the job with heartbeat timeout. But for this scene, the job should be retried.
I suggest the job should record it's worker ID, and if the sweep finds that worker is not available anymore, the job should get retried.