Closed narrowtux closed 5 days ago
I don't want to exclude a user error on my behalf by the way. I think if I could only know the PID of the job, I could have a look at its stack trace if it's actually still running.
nvm, it's probably our own application code.
For future reference, you can find all processes that are currently running an oban job by searching for Oban.Queue.Executor
in live_dashboard
For future reference, you can find all processes that are currently running an oban job by searching for Oban.Queue.Executor in live_dashboard
As of the next Oban release, when running with OTP 27/Elixir 1.17 you'll also have process labels to tell you which worker each PID is too.
Sounds great!
Environment
Oban versions:
Elixir version:
1.14.4-erlang-25.3-alpine-3.15.7
Postgres version: 12.?
Current Behavior
Sometimes, jobs are created that run forever:![image](https://github.com/sorentwo/oban/assets/616791/2a06ff41-3708-44bc-bf0e-07f0909f7f05)
Which jobs get stuck in this way seems random, it's not always the same worker.
DynamicLifeline plugin does nothing, I guess because the node and queue the job runs on haven't actually terminated.
I can't say if the process that should run the job is still alive, since I see no way to resolve a job ID to a PID. Maybe I can provide more debug information if I know how.
Exemplary job struct and queue info
Oban.check_queue(:cronjobs)
returned:Workaround
Manually identify the jobs that are stuck, cancel them and then retry. I see no way to do this automatically, since it's not apparent from the job struct if it's stuck.