I'm looking for advice on how to debug this issue:
I have a long running Job that does polling on an external resource. For example, the task contains a while loop that eventually breaks once some condition is met. In each iteration of the while loop, await asyncio.sleep(15) is called.
Each iteration of the while loop outputs a log message, so I can confirm that the Job is running.
If the worker running the Job is terminated (non-gracefully), I don't see that SAQ retries or requeues the Job.
Instead, I see the Finished Job log and the Sweeping job log from the saq logger for that Job after the timeout has exceeded.
My expectation is that once the worker is restarted, it will retry the Job and re-enter the polling while loop.
I'm reproducing this issue consistently. Here are the relevant Job retry parameters:
I'm looking for advice on how to debug this issue:
Job
that does polling on an external resource. For example, the task contains awhile
loop that eventually breaks once some condition is met. In each iteration of thewhile
loop,await asyncio.sleep(15)
is called.while
loop outputs a log message, so I can confirm that theJob
is running.Job
is terminated (non-gracefully), I don't see that SAQ retries or requeues theJob
.Finished Job
log and theSweeping job
log from thesaq
logger for thatJob
after the timeout has exceeded.Job
and re-enter the pollingwhile
loop.I'm reproducing this issue consistently. Here are the relevant
Job
retry parameters:SAQ version: https://github.com/tobymao/saq/tree/40f9b70b7083fe248107eeb0c01cf004e073bb9a