Open carsoncole opened 7 months ago
Hey @carsoncole, sorry about that! Could you share a bit more about your DB setup? Also, could you try setting this:
config.solid_queue.silence_polling = false
And then starting solid_queue:start
again? That might give us more clues about what's going on.
I was seeing the same problem. ~Disabling polling silence seems to trigger it to work properly.~
It turns out that it only works until a ruby file is changed.
@rosa Overnight, jobs stopped executing again. I set silence_polling = false
. I've included the logs at the moment a job was created and enqueued. This is all running in development on my local Mac.
SolidQueue::ScheduledExecution Pluck (3.0ms) SELECT "solid_queue_scheduled_executions"."job_id" FROM "solid_queue_scheduled_executions" WHERE "solid_queue_scheduled_executions"."scheduled_at" <= $1 ORDER BY "solid_queue_scheduled_executions"."scheduled_at" ASC, "solid_queue_scheduled_executions"."priority" ASC, "solid_queue_scheduled_executions"."job_id" ASC LIMIT $2 FOR UPDATE SKIP LOCKED [["scheduled_at", "2024-04-11 15:19:59.498234"], ["LIMIT", 500]]
TRANSACTION (0.1ms) COMMIT
[ActiveJob] TRANSACTION (0.2ms) BEGIN
[ActiveJob] SolidQueue::Job Create (2.5ms) INSERT INTO "solid_queue_jobs" ("queue_name", "class_name", "arguments", "priority", "active_job_id", "scheduled_at", "finished_at", "concurrency_key", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) RETURNING "id" [["queue_name", "default"], ["class_name", "ProcessNewJobsToProfilesJob"], ["arguments", "{\"job_class\":\"ProcessNewJobsToProfilesJob\",\"job_id\":\"5fc05678-dbf5-45e6-81d6-f925734589f5\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[],\"executions\":0,\"exception_executions\":{},\"locale\":\"en\",\"timezone\":\"Pacific Time (US \\u0026 Canada)\",\"enqueued_at\":\"2024-04-11T15:20:00.008251000Z\",\"scheduled_at\":\"2024-04-11T15:20:00.008202000Z\"}"], ["priority", 0], ["active_job_id", "5fc05678-dbf5-45e6-81d6-f925734589f5"], ["scheduled_at", "2024-04-11 15:20:00.008202"], ["finished_at", nil], ["concurrency_key", "[FILTERED]"], ["created_at", "2024-04-11 15:20:00.008732"], ["updated_at", "2024-04-11 15:20:00.008732"]]
[ActiveJob] TRANSACTION (0.2ms) SAVEPOINT active_record_1
[ActiveJob] SolidQueue::Job Load (0.2ms) SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2 [["id", 907], ["LIMIT", 1]]
[ActiveJob] SolidQueue::ReadyExecution Create (0.7ms) INSERT INTO "solid_queue_ready_executions" ("job_id", "queue_name", "priority", "created_at") VALUES ($1, $2, $3, $4) RETURNING "id" [["job_id", 907], ["queue_name", "default"], ["priority", 0], ["created_at", "2024-04-11 15:20:00.017665"]]
[ActiveJob] TRANSACTION (0.1ms) RELEASE SAVEPOINT active_record_1
[ActiveJob] Log Create (0.4ms) INSERT INTO "logs" ("status", "log_class", "class_id", "message", "request", "response_code", "source_id", "created_at", "updated_at") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) RETURNING "id" [["status", 0], ["log_class", "ProcessNewJobsToProfilesJob"], ["class_id", nil], ["message", "Job #5fc0567 enqueued"], ["request", nil], ["response_code", nil], ["source_id", nil], ["created_at", "2024-04-11 15:20:00.020842"], ["updated_at", "2024-04-11 15:20:00.020842"]]
[ActiveJob] ↳ app/jobs/application_job.rb:9:in `block in <class:ApplicationJob>'
[ActiveJob] Enqueued ProcessNewJobsToProfilesJob (Job ID: 5fc05678-dbf5-45e6-81d6-f925734589f5) to SolidQueue(default)
SolidQueue::Job Load (0.1ms) SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2 [["id", 907], ["LIMIT", 1]]
SolidQueue::RecurringExecution Create (0.6ms) INSERT INTO "solid_queue_recurring_executions" ("job_id", "task_key", "run_at", "created_at") VALUES ($1, $2, $3, $4) RETURNING "id" [["job_id", 907], ["task_key", "[FILTERED]"], ["run_at", "2024-04-11 15:20:00"], ["created_at", "2024-04-11 15:20:00.023454"]]
TRANSACTION (1.5ms) COMMIT
Rendered admin/logs/_log.html.erb (Duration: 0.3ms | Allocations: 73)
[ActionCable] Broadcasting to logs: "<turbo-stream action=\"prepend\" target=\"logs\"><template><tr id=\"log_2893\" class=\"border-b\">\n <td class=\"px-2 py-2 w-16 text-center dark:text-slate-700\">\n <div class='rounded font-bold text-xs px-2 py-1 bg-blue-200'>INFO</div>\n </td>\n <td class=\"px-2 py-2\">\n ProcessNewJob...
TRANSACTION (0.2ms) BEGIN
SolidQueue::ScheduledExecution Pluck (1.9ms) SELECT "solid_queue_scheduled_executions"."job_id" FROM "solid_queue_scheduled_executions" WHERE "solid_queue_scheduled_executions"."scheduled_at" <= $1 ORDER BY "solid_queue_scheduled_executions"."scheduled_at" ASC, "solid_queue_scheduled_executions"."priority" ASC, "solid_queue_scheduled_executions"."job_id" ASC LIMIT $2 FOR UPDATE SKIP LOCKED [["scheduled_at", "2024-04-11 15:20:00.507423"], ["LIMIT", 500]]
TRANSACTION (0.1ms) COMMIT
TRANSACTION (0.3ms) BEGIN
SolidQueue::ScheduledExecution Pluck (3.2ms) SELECT "solid_queue_scheduled_executions"."job_id" FROM "solid_queue_scheduled_executions" WHERE "solid_queue_scheduled_executions"."scheduled_at" <= $1 ORDER BY "solid_queue_scheduled_executions"."scheduled_at" ASC, "solid_queue_scheduled_executions"."priority" ASC, "solid_queue_scheduled_executions"."job_id" ASC LIMIT $2 FOR UPDATE SKIP LOCKED [["scheduled_at", "2024-04-11 15:20:01.515209"], ["LIMIT", 500]]
In the Solid Queue console here is the output since I launched it, which I notice undefined method
queue_adapter_name' for nil:NilClass`...
carsoncole@Carsons-Mac-mini ahoyjob % rake solid_queue:start
[SolidQueue] Starting Dispatcher(pid=90937, hostname=Carsons-Mac-mini.local, metadata={:polling_interval=>1, :batch_size=>500, :concurrency_maintenance_interval=>600, :recurring_schedule=>{:"Expire subscriptions"=>{:schedule=>"every day at 00:01", :class_name=>"ExpireSubscriptionsJob", :arguments=>[]}, :send_notifications_subscriptions_expired=>{:schedule=>"every day at 00:15", :class_name=>"SendSubscriptionExpirationNotificationJob", :arguments=>[]}, :send_notifications_subscriptions_expiring=>{:schedule=>"every day at 10:00am", :class_name=>"SendSubscriptionExpiringIn7DaysEmailJob", :arguments=>[]}, :"Send new jobs email"=>{:schedule=>"every weekday at 5:00, 8:00, 11:00, 13:00, 15:00", :class_name=>"SendNewJobsEmailJob", :arguments=>[]}, :"Identify jobs"=>{:schedule=>"every 30 minutes at minute 5", :class_name=>"IdentifyJobsJob", :arguments=>[]}, :"Add jobs to Profiles"=>{:schedule=>"every 10 minutes", :class_name=>"ProcessNewJobsToProfilesJob", :arguments=>[]}, :"Destroy incomplete Jobs"=>{:schedule=>"every day at 7:00am", :class_name=>"CleanUpJobsJob", :arguments=>[]}, :"Clean up incomplete Jobs"=>{:schedule=>"every 4 hours at minute 45", :class_name=>"CleanUpIncompleteJobs", :arguments=>[]}}})
[SolidQueue] Starting Worker(pid=90938, hostname=Carsons-Mac-mini.local, metadata={:polling_interval=>1, :queues=>"*", :thread_pool_size=>3})
[SolidQueue] undefined method `queue_adapter_name' for nil:NilClass
[SolidQueue] Skipped recurring task Add jobs to Profiles at 2024-04-11 08:20:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Identify jobs at 2024-04-11 08:30:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Identify jobs at 2024-04-11 09:30:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Add jobs to Profiles at 2024-04-11 09:30:00 UTC — already dispatched
[SolidQueue] undefined method `queue_adapter_name' for nil:NilClass
[SolidQueue] undefined method `queue_adapter_name' for nil:NilClass
[SolidQueue] Skipped recurring task Send new jobs email at 2024-04-11 12:00:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Destroy incomplete Jobs at 2024-04-11 14:00:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Add jobs to Profiles at 2024-04-11 14:20:00 UTC — already dispatched
[SolidQueue] Skipped recurring task Identify jobs at 2024-04-11 14:30:00 UTC — already dispatched
Also, I'm seeing 8 records in the table solid_queue_ready_executions
. Are these the jobs that are ready to be executed but aren't being executed?
Another recurring job just added. It's in the table solid_queue_recurring_executions
and I see it as an entry in solid_queue_ready_executions
, but in the log I only see the constant query SELECT "solid_queue_scheduled_executions"
around the time the job is created.
Thanks for all this extra info! All the logs correspond to the dispatcher, it's as if the worker wasn't doing anything 🤔 The errors
[SolidQueue] undefined method `queue_adapter_name' for nil:NilClass
aren't related to jobs not being processed; they correspond to recurring tasks whose job class might not be defined. I need to improve this to show a better error in that case and don't try to dispatch the task, but you can ignore the errors for now.
Could you share a bit more about your DB setup?
database.yml
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
development:
<<: *default
database: ahoyjob_development
This may be relevant, I'm using Postgresql for Action Cable
cable.yml
development:
adapter: postgresql
test:
adapter: test
production:
adapter: postgresql
Another datapoint is that I have a Job
class in my app that represents jobs, ie. developer, bus driver etc.
So through my test case below, I can trigger the undefined method `queue_adapter_name' for nil:NilClass
with an incorrectly named class.
solid_queue.yml
"Test Job":
class: Testjob
schedule: every minute
Job file was named testjob.rb
and the class name was TestJob
I've fixed that now and running again to see if that has anything to do with not working jobs.
It does appear that the SolidQueue::ReadyExecution
just dies and does not give any indication of why. In my logs it is executing at its interval of 1 second, and then stops. Here in my logs you see it actually finds a record when doing
SolidQueue::ReadyExecution Pluck (2.5ms) SELECT "solid_queue_ready_executions"."job_id" FROM "solid_queue_ready_executions" ORDER BY "solid_queue_ready_executions"."priority" ASC, "solid_queue_ready_executions"."job_id" ASC LIMIT $1 FOR UPDATE SKIP LOCKED [["LIMIT", 3]]
But then stops. Here is the full log and after the last entry for SELECT solid_queue_ready_executions
it no longer makes the call.
SolidQueue::Job Load (0.2ms) SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2 [["id", 930], ["LIMIT", 1]]
SolidQueue::RecurringExecution Create (1.4ms) INSERT INTO "solid_queue_recurring_executions" ("job_id", "task_key", "run_at", "created_at") VALUES ($1, $2, $3, $4) RETURNING "id" [["job_id", 930], ["task_key", "[FILTERED]"], ["run_at", "2024-04-11 17:50:00"], ["created_at", "2024-04-11 17:50:00.141010"]]
TRANSACTION (0.4ms) COMMIT
Rendered admin/logs/_log.html.erb (Duration: 0.6ms | Allocations: 576)
[ActionCable] Broadcasting to logs: "<turbo-stream action=\"prepend\" target=\"logs\"><template><tr id=\"log_2970\" class=\"border-b\">\n <td class=\"px-2 py-2 w-16 text-center dark:text-slate-700\">\n <div class='rounded font-bold text-xs px-2 py-1 bg-blue-200'>INFO</div>\n </td>\n <td class=\"px-2 py-2\">\n ProcessNewJob...
SolidQueue::Pause Pluck (0.9ms) SELECT "solid_queue_pauses"."queue_name" FROM "solid_queue_pauses"
TRANSACTION (0.2ms) BEGIN
SolidQueue::ReadyExecution Pluck (2.5ms) SELECT "solid_queue_ready_executions"."job_id" FROM "solid_queue_ready_executions" ORDER BY "solid_queue_ready_executions"."priority" ASC, "solid_queue_ready_executions"."job_id" ASC LIMIT $1 FOR UPDATE SKIP LOCKED [["LIMIT", 3]]
SolidQueue::ClaimedExecution Insert (1.9ms) INSERT INTO "solid_queue_claimed_executions" ("job_id","process_id","created_at") VALUES (930, 148, CURRENT_TIMESTAMP) RETURNING "id"
SolidQueue::ClaimedExecution Load (0.4ms) SELECT "solid_queue_claimed_executions".* FROM "solid_queue_claimed_executions" WHERE "solid_queue_claimed_executions"."job_id" = $1 AND "solid_queue_claimed_executions"."process_id" = $2 [["job_id", 930], ["process_id", 148]]
SolidQueue::ReadyExecution Delete All (1.3ms) DELETE FROM "solid_queue_ready_executions" WHERE "solid_queue_ready_executions"."job_id" = $1 [["job_id", 930]]
TRANSACTION (0.4ms) COMMIT
SolidQueue::Pause Pluck (0.8ms) SELECT "solid_queue_pauses"."queue_name" FROM "solid_queue_pauses"
TRANSACTION (1.0ms) BEGIN
SolidQueue::ReadyExecution Pluck (2.7ms) SELECT "solid_queue_ready_executions"."job_id" FROM "solid_queue_ready_executions" ORDER BY "solid_queue_ready_executions"."priority" ASC, "solid_queue_ready_executions"."job_id" ASC LIMIT $1 FOR UPDATE SKIP LOCKED [["LIMIT", 2]]
TRANSACTION (0.4ms) COMMIT
SolidQueue::Job Load (0.8ms) SELECT "solid_queue_jobs".* FROM "solid_queue_jobs" WHERE "solid_queue_jobs"."id" = $1 LIMIT $2 [["id", 930], ["LIMIT", 1]]
TRANSACTION (0.2ms) BEGIN
SolidQueue::ScheduledExecution Pluck (1.8ms) SELECT "solid_queue_scheduled_executions"."job_id" FROM "solid_queue_scheduled_executions" WHERE "solid_queue_scheduled_executions"."scheduled_at" <= $1 ORDER BY "solid_queue_scheduled_executions"."scheduled_at" ASC, "solid_queue_scheduled_executions"."priority" ASC, "solid_queue_scheduled_executions"."job_id" ASC LIMIT $2 FOR UPDATE SKIP LOCKED [["scheduled_at", "2024-04-11 17:50:01.129034"], ["LIMIT", 500]]
TRANSACTION (0.1ms) COMMIT
TRANSACTION (0.2ms) BEGIN
SolidQueue::ScheduledExecution Pluck (1.6ms) SELECT "solid_queue_scheduled_executions"."job_id" FROM "solid_queue_scheduled_executions" WHERE "solid_queue_scheduled_executions"."scheduled_at" <= $1 ORDER BY "solid_queue_scheduled_executions"."scheduled_at" ASC, "solid_queue_scheduled_executions"."priority" ASC, "solid_queue_scheduled_executions"."job_id" ASC LIMIT $2 FOR UPDATE SKIP LOCKED [["scheduled_at", "2024-04-11 17:50:02.138170"], ["LIMIT", 500]]
We noticed a similar issue in our setup. I couldn't reproduce it with a clean Rails app, so that makes it harder to debug. For us, we could reliably reproduce the issue in the following way:
We use a default configuration of solidqueue. I tried to dive into it a bit further. I found that solidqueue picked up the job and dispatches it to ActiveSupport. Here it gets stuck in executing before
callbacks. It gets very hairy here for me to understand what is going on, but it seemed that one of the before
callbacks tries to obtain a ShareLock, which is already held by a dead solidqueue worker. So the before
callback never acquires the lock, and therefore keeps waiting indefinetely.
{#<Thread:0x000000010fc929e8@io-worker-1 /Users/myname/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:345 dead>=>[nil, [:load, false]]}
This specific thread was started by launch_process_prune
. I am thinking that a code reload somehow kills this worker thread, but I can't really figure out why or what should be done to prevent it.
@carsoncole thanks for all these extra details 🙏🏻 @SjoerdL, that's super helpful! I'll try to investigate a bit more today, and see if I can get to to bottom of this 🔍💪🏻
I got the issue to show itself quickly by changing the threads to 1. To hide the issue longer, increase the number of threads.
workers:
- queues: "*"
threads: 1
processes: 1
polling_interval: 1
I'm experiencing a similar problem too, and is reliably caused in development mode by saving any Ruby file in my Rails project. As soon as I do, new jobs are picked up but never finish, and I have to ctrl+c solid queue and re-start it. If I don't save a file, Solid Queue will continue to work fine for hours.
I've not checked if they actually run or not before getting stuck, I came across this issue while investigating whether I'd setup something up incorrectly, but this seems very similar behaviour to what @SjoerdL is experiencing.
Hey, @rosa, thanks for digging into this one!
I can reproduce this in a personal project in development, presenting exactly the same way as @carldr describes above.
Having the save problem. Seems to work fine until I edit and save a file.
I'm experiencing a similar problem too, and is reliably caused in development mode by saving any Ruby file in my Rails project. As soon as I do, new jobs are picked up but never finish, and I have to ctrl+c solid queue and re-start it. If I don't save a file, Solid Queue will continue to work fine for hours.
I've not checked if they actually run or not before getting stuck, I came across this issue while investigating whether I'd setup something up incorrectly, but this seems very similar behaviour to what @SjoerdL is experiencing.
I'm currently able to reproduce doing this exact step. Perform an action in the app that enqueues a job, it executes flawlessly. Save a couple files (I added comments to 3 files and saved them), now that exact same action done in the exact same manner enqueues a job, but it gets stuck in "In Progress Jobs" for some reason.
FWIW I'm running solid_queue using a foreman Procfile.dev
:
web: bundle exec rdbg --command --nonstop --open=vscode --sock_path=tmp/sockets/rdbg.socket -- bundle exec bin/rails server -p 3000
css: bundle exec bin/rails tailwindcss:watch[debug]
worker: if [ "$SKIP_SOLID_QUEUE" != "true" ]; then bundle exec rake solid_queue:start; else sleep 2147483647; fi
SolidQueue: 0.3.0 Rails 7.1.3.2
I have mission control too? Not sure that's relevant but figured I'd include it, it's what I use to see that jobs got stuck in "In Progress"
Thanks everyone for the info, and so sorry I haven't been able to work on this one yet. It's the next one on deck once I get #208 out of the way 💪
Hmm, just to make sure I'm on the same page - is SolidQueue supposed to auto-reload code in development? I remember back in the day Sidekiq did not, but I just checked and it looks like it has done so for several years so my knowledge is a bit out of date. But I imagine for a first-party solution it should work more seamlessly anyway.
I feel like I may have experienced something like this but I am not 100% sure.
But I imagine for a first-party solution it should work more seamlessly anyway.
Absolutely, I need to get this fixed 100% 😅 So yes, this is definitely a problem of Solid Queue right now.
Thanks again everyone for the patience and the steps to reproduce this, it has been incredibly helpful 🙇♀️ ❤️ . I reproduced it without issues thanks to that, and I think I've tracked down the problem to a fix put in place for code reloading 😅 I think I know how to fix but need to do some more tests and it's getting a bit late over here, hopefully I'll be able to wrap this up tomorrow! 🤞
Awesome! Thanks for jumping on this rosa, you're a rockstar (And this project is really sweet), it's been bothering me and my team for a little bit now so it's good to see some momentum on this fix!
I've been running the "reload-dev-fix" #223 branch and jobs still fail to run after a while and are just enqueued. At one point, "Release claimed job" runs and the jobs are run, but I've got them enqueuing again, and 30 mins have passed so not sure how often the release happens.
SolidQueue-0.3.0 Prune dead processes (1.9ms) size: 0
SolidQueue-0.3.0 Prune dead processes (1.6ms) size: 0
SolidQueue-0.3.0 Release claimed job (11.7ms) job_id: 1256, process_id: 225
SolidQueue-0.3.0 Release claimed job (6.7ms) job_id: 1257, process_id: 225
SolidQueue-0.3.0 Release claimed job (4.0ms) job_id: 1258, process_id: 225
SolidQueue-0.3.0 Release claimed job (3.8ms) job_id: 1259, process_id: 225
SolidQueue-0.3.0 Release claimed job (3.5ms) job_id: 1260, process_id: 225
For me, it's easy to see when the jobs stop running as I'm using hooks in ApplicationJob to log every step of the process. This may be helpful for others trying to debug to use.
class ApplicationJob < ActiveJob::Base
after_enqueue do |job|
Log.create(message: "Job ##{ job.job_id[0, 7] } enqueued", log_class: job.class, status: :info, queue: job.queue_name)
end
before_perform do |job|
Log.create(message: "Job ##{ job.job_id[0, 7] } started", log_class: job.class, status: :info, queue: job.queue_name)
end
after_perform do |job|
Log.create(message: "Job ##{ job.job_id[0, 7] } completed", log_class: job.class, status: :info, queue: job.queue_name)
end
end
Huh, I see! @carsoncole, but does this happen when you modify any code? Or just out of the blue without you doing anything? Wondering if this issue is the same as the one I've attempted to fix or if it's something else 🤔 The bit that I can't explain is why your jobs are released:
SolidQueue-0.3.0 Release claimed job (11.7ms) job_id: 1256, process_id: 225
SolidQueue-0.3.0 Release claimed job (6.7ms) job_id: 1257, process_id: 225
SolidQueue-0.3.0 Release claimed job (4.0ms) job_id: 1258, process_id: 225
SolidQueue-0.3.0 Release claimed job (3.8ms) job_id: 1259, process_id: 225
SolidQueue-0.3.0 Release claimed job (3.5ms) job_id: 1260, process_id: 225
This means something happened to process 225, but it's strange that you didn't get any logs about that, about the process being deregistered... 😕 Did you happen to see if any of the logs lines about
SolidQueue-0.3.0 Prune dead processes (1.6ms) size: 0
had any other size
than 0
?
I'm also running that branch, and I'm getting this a lot. Probably every 5 or six seconds after I start changing code.
SolidQueue-0.3.0 Prune dead processes (1.9ms) size: 0
I've been running this for a few hours and I don't have any logs indicating a size other than 0.
This log entry is occurring ever 1-2 seconds.
SolidQueue-0.3.0 Prune dead processes (5.9ms) size: 0
As for modifying code, I've made multiple code changes across the app during this timeframe.
So, to be clear, this is expected:
SolidQueue-0.3.0 Prune dead processes (1.9ms) size: 0
But I realised I left a debugging setting there and changed the frequency of the dead processes check to every 2 seconds instead of every SolidQueue.process_alive_threshold
, which is 5 minutes by default. I'll push a change for that. But in any case, the logs
SolidQueue-0.3.0 Prune dead processes (1.9ms) size: 0
are expected and don't mean something is wrong. The workers hanging and not processing jobs do mean something is wrong.
Going to push the change to stop running the process prune every 2 seconds.
Another question: @carsoncole: is it possible something external to Solid Queue is killing your worker processes? What you describe is different from the error I've fixed 🤔 To be clear, the error I've addressed is:
I'm experiencing a similar problem too, and is reliably caused in development mode by saving any Ruby file in my Rails project. As soon as I do, new jobs are picked up but never finish, and I have to ctrl+c solid queue and re-start it. If I don't save a file, Solid Queue will continue to work fine for hours.
I understand on the "Prune dead processes" log entry. Sounds good.
As for the issue of not saving files results in everything working fine, I'll test for that. But as of now, I am changing files and the issue is occurring.
Ohhh bummer! And so odd! I couldn't reproduce it anymore with that change, while I could reproduce it consistently without it. I'll keep digging. Thank you so much for your patience 🙏
I can confirm that the following is correct.
I'm experiencing a similar problem too, and is reliably caused in development mode by saving any Ruby file in my Rails project. As soon as I do, new jobs are picked up but never finish, and I have to ctrl+c solid queue and re-start it. If I don't save a file, Solid Queue will continue to work fine for hours.
I can't see the pattern as to what file change affects what, as I have different queues, and it'll be just one queue that no longer executes enqueued jobs unless restarted.
So far in my limited testing, that branch has fixed my issue of code reloading. I'll continue to experiment w/ it; I haven't been testing with it for very long.
This is the full dump of the stack trace that happens in my environment for @rafaelfranca:
This particular crash happened because I did a db:reset. The dump is too large to paste here, so it's a gist:
@rosa I was experiencing the same thing, came here to see what the latest update is and can confirm reload-dev-fix
appears to fix it for me (at least it's been over an hour since I updated and it hasn't died yet, but normally wouldn't last more than 15-30 minutes)
Thanks a lot for letting me know, @andyjeffries and @nzifnab 🙏 I'm going to merge that fix and ship a new version but will keep this one open since it still fails for some people, so I'll keep investigating.
FWIW I have also been using the reload-dev-fix
branch for around 5 days now, and I haven't seen this issue crop up once!
I switched to the newly released version that includes this merged fix since Tuesday May 21 08:50 BST and it's been absolutely fine for me since then too. Awesome fix @rosa, thanks !
Hey @carsoncole, are you still experiencing this? I haven't heard any more reports since this fix went live, so I'd like to dig more into your case, because it's very strange. You mentioned:
as I have different queues, and it'll be just one queue that no longer executes enqueued jobs unless restarted.
But are these queues run by different workers, all individually? Or multiple queues can be run by the same worker? Just wondering if this might be a priority kind of problem 🤔
👋 Came to the repo to open a similar issue, which I've been experiencing in local development since January. Until reading this, I was sort of at a loss as to why solid_queue workers would pick stuff up fine at startup and then eventually stop grabbing work—I hadn't considered that it might be due to (some consequence of) class reloading.
Unfortunately, I'm currently on the released version (0.3.4) and still experiencing this daily in development. I was just about to switch over to running inline or async locally to unblock myself. It's possible that I'm experiencing a different issue leading to the same outcome, but I don't feel well-equipped to debug it (any tips would be appreciated)
Ouch, sorry to hear that @searls! I wonder if you could try running the branch in https://github.com/rails/solid_queue/pull/269, not via the Puma plugin or anything, but just directly, and see if that changes anything 🤔
Do you think your issue happens also due to class reloading or could it be something else? What DB are you using?
Hi @rosa thank you for replying. I have been running off the branch in #269 since this morning and haven't seen it fail yet, but I'll keep you posted. I was almost sure it was failing again but it turns out that I was seeing a separate apparent issue
Thanks a lot for letting me know, @searls 🙏 Huh, that issue is strange indeed 🤔
I seem to have a persistent problem of jobs being worked but new jobs just get enqueued and workers stop pulling jobs to be worked. If I restart Solid Queue (
rake solid_queue:start
) then the jobs are immediately worked.This happens with both recurring jobs as well as jobs that are "perform_later".
Starting the solid queue looks all good and there aren't any subsequent log items that indicate any issues.
The only setting in my
development.rb
: config.active_job.queue_adapter = :solid_queueSolid Queue: 0.3.0 Rails: 7.1.3.2