# oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
# Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=3999114, just started
About 1 in 10 jobs sent to an RQ worker while RQ scheduler is running will hang indefinitely. This just began happening last Friday after a pip install to update a venv's deps.
The job's main body is never reached, I have seen jobs whose first call is logging never produce any logs after being anomalously claimed. It does not matter if RQ scheduler was the originator of the job: I have seen both scheduled and manually invoked jobs hang in this manner. The behavior is no longer exhibited when RQ scheduler is closed. The RQ worker still responds to signals and can still have a successful warm shutdown. No logs are produced after the first logger call indicating the worker took the job. Here's an annotated example of one of those logs:
# Previous job successfully finishing
INFO:rq.worker:default: Job OK (92135fb3-a891-4bc5-97be-8f0e4801424c)
INFO:rq.worker:Result is kept for 3600 seconds
# Anomalous job begins
INFO:rq.worker:default: Uploading file (84db2e7a-c99b-463b-8e70-76cce10beb01)
# Hangs here indefinitely, until a signal is sent
INFO:rq.worker:Worker 480298b5bce645e696cd8082c25510bd [PID 3260661]: warm shut down requested
No pertinent logging is reported by RQ scheduler, just the regular Registering birth and Registering death that happens when workers are spun up or killed.
I inspected the worker object as saved in the Redis server, and it has a busy status with the correct job ID listed within. The job objects seem to be mildly corrupted, as the status of the job remains queued and the last_heartbeat corresponding to the second the RQ worker froze up.
The config file fed into RQ is minimal and just spins up a copy of the parent Flask app in order to access an app context. Let me know if I need to post any more info or logs here. Thanks!
We're having the same behavior crop up today after removing RQ scheduler from the stack last week. Closing this issue as it likely isn't related to RQ scheduler.
This occurs on Python 3.11.4 Versions of installed libraries (pared down for relevance):
About 1 in 10 jobs sent to an RQ worker while RQ scheduler is running will hang indefinitely. This just began happening last Friday after a
pip install
to update a venv's deps.The job's main body is never reached, I have seen jobs whose first call is logging never produce any logs after being anomalously claimed. It does not matter if RQ scheduler was the originator of the job: I have seen both scheduled and manually invoked jobs hang in this manner. The behavior is no longer exhibited when RQ scheduler is closed. The RQ worker still responds to signals and can still have a successful warm shutdown. No logs are produced after the first logger call indicating the worker took the job. Here's an annotated example of one of those logs:
No pertinent logging is reported by RQ scheduler, just the regular
Registering birth
andRegistering death
that happens when workers are spun up or killed.I inspected the worker object as saved in the Redis server, and it has a
busy
status with the correct job ID listed within. The job objects seem to be mildly corrupted, as the status of the job remainsqueued
and thelast_heartbeat
corresponding to the second the RQ worker froze up.I note that there is precedence for RQ scheduler to cause this behavior, even if I was not able to track down the root cause: https://stackoverflow.com/questions/53640418/django-rq-rqworker-freezes-indefinitely / https://github.com/rq/rq-scheduler/issues/197
The config file fed into RQ is minimal and just spins up a copy of the parent Flask app in order to access an app context. Let me know if I need to post any more info or logs here. Thanks!