Open theunkn0wn1 opened 2 years ago
That sounds weird, redis already knows about the job obviously. PR welcome to fix this.
If you can't work out how to fix this, could you create a minimal example to demonstrate the problem?
Managed to reproduce in isolation. Here is a gist with everything needed short of a redis server.
https://gist.github.com/theunkn0wn1/a237cc816ec15a5a053bab11780c0bb4
Steps to reproduce:
run arq_workspace.launcher_client
an record the job token:
/home/orion/.cache/pypoetry/virtualenvs/arq-workspace-Flug7Sf2-py3.10/bin/python -m arq_workspace.launcher_client
connecting to redis...
spawning job...
job spawned. your ID is 'dc17598c7b5e43a2a34d5b50ec9dbee2'
your job token is my_agent:dc17598c7b5e43a2a34d5b50ec9dbee2
Run arq_workspace.status_client
and plug in the job token
/home/orion/.cache/pypoetry/virtualenvs/arq-workspace-Flug7Sf2-py3.10/bin/python -m arq_workspace.status_client
Enter job id: my_agent:dc17598c7b5e43a2a34d5b50ec9dbee2
connecting to redis...
host='my_agent'; jid:
spawning job...
job not complete: not_found, sleeping before continuing...
Observe the fact the status agent reports "not found" for a valid job ID and queue, created in step 1.
Launch arq_workspace.agent
, note that the agent picks up the item. also note that the status client will report the task as completed.
/home/orion/.cache/pypoetry/virtualenvs/arq-workspace-Flug7Sf2-py3.10/bin/python -m arq_workspace.status_client
/home/orion/.cache/pypoetry/virtualenvs/arq-workspace-Flug7Sf2-py3.10/bin/python -m arq_workspace.agent
Starting worker...
task_custom_add(x=4, y=6)
...
job not complete: not_found, sleeping before continuing...
JobStatus.complete
As an update to this, Arq's task status reporting capability is entirely unreliable.
I have now observed it reporting that jobs don't exist that are both actively executing, and previous requests to the same job ID reported running. Something with arq's task status reporting is horribly buggy.
I will need to implement my own thing to work around this bug.
Just my 2 cents on this, have you tried the same gist without using a custom queue name @theunkn0wn1 ? I've never dug deeply into it because time's lacking, but every time I tried using custom queue names I ended up having issues, see https://github.com/samuelcolvin/arq/issues/348 for instance, and reverting to using the defaults proved more reliable
In my application, the arq workers are not always online (e.g. down for maintainence, network issues).
When a job is enqueued to an arq task queue, it yields a task ID. If there are no workers observing that task queue, asking arq for the status of the job will produce the value "not found". This reading is false. The moment a worker reads the job from the queue, the job ID will resolve into the "queued" state as expected.
Expected behavior: jobs enqueued should either be in the "deferred" or "queued" state and read as such. Actual behavior: jobs will only have a status after an arq worker for the queue sees the enqueued job.