rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

Pessimistic lock errors with the heartbeat #350

Closed danielschnee closed 2 months ago

danielschnee commented 2 months ago

I get a lot of pessimistic lock errors. Something around every minute because of the heartbeat.

The error:

Locking a record with unpersisted changes is not supported. Use `save` to persist the changes, or `reload` to discard them explicitly. Changed attributes: "last_heartbeat_at".

Backtrace:

 activerecord (7.2.1) lib/active_record/locking/pessimistic.rb:72:in `lock!'
 activerecord (7.2.1) lib/active_record/locking/pessimistic.rb:96:in `block in with_lock'
 activerecord (7.2.1) lib/active_record/connection_adapters/abstract/transaction.rb:616:in `block in within_new_transaction'
 activesupport (7.2.1) lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
 activerecord (7.2.1) lib/active_record/connection_adapters/abstract/transaction.rb:613:in `within_new_transaction'
 activerecord (7.2.1) lib/active_record/connection_adapters/abstract/database_statements.rb:361:in `transaction'
 activerecord (7.2.1) lib/active_record/transactions.rb:234:in `block in transaction'
 activerecord (7.2.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:395:in `with_connection'
 activerecord (7.2.1) lib/active_record/connection_handling.rb:296:in `with_connection'
 activerecord (7.2.1) lib/active_record/transactions.rb:233:in `transaction'
 activerecord (7.2.1) lib/active_record/transactions.rb:354:in `transaction'
 activerecord (7.2.1) lib/active_record/locking/pessimistic.rb:95:in `with_lock'
 solid_queue (0.9.0) lib/solid_queue/processes/registrable.rb:56:in `heartbeat'
 solid_queue (0.9.0) lib/solid_queue/processes/registrable.rb:41:in `block (2 levels) in launch_heartbeat'
 activesupport (7.2.1) lib/active_support/execution_wrapper.rb:91:in `wrap'
 solid_queue (0.9.0) lib/solid_queue/app_executor.rb:7:in `wrap_in_app_executor'
 solid_queue (0.9.0) lib/solid_queue/processes/registrable.rb:41:in `block in launch_heartbeat'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `block in synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:22:in `execute'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/timer_task.rb:339:in `execute_task'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `block in synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:22:in `execute'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/ivar.rb:170:in `safe_execute'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/scheduled_task.rb:298:in `process_task'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/timer_set.rb:166:in `block (2 levels) in process_tasks'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:359:in `run_task'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:350:in `block (3 levels) in create_worker'
 <internal:kernel>:187:in `loop'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `block (2 levels) in create_worker'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `catch'
 concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `block in create_worker'

Is there something i can do on my side?

Running:

What else do you need from me?

rosa commented 2 months ago

Oh, that's a strange one! Let me look into it.

rosa commented 2 months ago

Do you get any errors before this, when Solid Queue starts, when registering any of the processes (dispatchers, workers, etc.)? You should get some lines like this:

SolidQueue-0.9.0 Register Supervisor (10.0ms)  pid: 34424, hostname: "Rosas-Air-M2.localdomain", process_id: 6, name: "supervisor-e232055b690c3c5d3e63"
SolidQueue-0.9.0 Fail claimed jobs (3.6ms)  job_ids: [], process_ids: []
SolidQueue-0.9.0 Started Supervisor (36.5ms)  pid: 34424, hostname: "Rosas-Air-M2.localdomain", process_id: 6, name: "supervisor-e232055b690c3c5d3e63"
SolidQueue-0.9.0 Prune dead processes (19.1ms)  size: 0
SolidQueue-0.9.0 Register Dispatcher (43.3ms)  pid: 34425, hostname: "Rosas-Air-M2.localdomain", process_id: 7, name: "dispatcher-8c9a168af2f75adbde63"
SolidQueue-0.9.0 Started Dispatcher (44.5ms)  pid: 34425, hostname: "Rosas-Air-M2.localdomain", process_id: 7, name: "dispatcher-8c9a168af2f75adbde63", polling_interval: 1, batch_size: 500, concurrency_maintenance_interval: 600
SolidQueue-0.9.0 Register Worker (42.5ms)  pid: 34427, hostname: "Rosas-Air-M2.localdomain", process_id: 8, name: "worker-426c95702eccb5c40ffd"
SolidQueue-0.9.0 Started Worker (43.4ms)  pid: 34427, hostname: "Rosas-Air-M2.localdomain", process_id: 8, name: "worker-426c95702eccb5c40ffd", polling_interval: 0.1, queues: "default", thread_pool_size: 5
SolidQueue-0.9.0 Register Worker (44.3ms)  pid: 34426, hostname: "Rosas-Air-M2.localdomain", process_id: 9, name: "worker-cd0acd29813c42fb1214"
SolidQueue-0.9.0 Started Worker (45.2ms)  pid: 34426, hostname: "Rosas-Air-M2.localdomain", process_id: 9, name: "worker-cd0acd29813c42fb1214", polling_interval: 0.1, queues: "background", thread_pool_size: 3
SolidQueue-0.9.0 Register Scheduler (48.1ms)  pid: 34428, hostname: "Rosas-Air-M2.localdomain", process_id: 10, name: "scheduler-7d5a0693d55b7a6b85d0"
SolidQueue-0.9.0 Started Scheduler (53.9ms)  pid: 34428, hostname: "Rosas-Air-M2.localdomain", process_id: 10, name: "scheduler-7d5a0693d55b7a6b85d0", recurring_schedule: ["periodic_store_result"]

I have a suspicion of what this might be about...

danielschnee commented 2 months ago

Local: image

Running it with the puma plugin.

danielschnee commented 2 months ago

From the CI: image

rosa commented 2 months ago

Thank you! That's very helpful 🙏

danielschnee commented 2 months ago

If that helps.

Locally its running fine: image

On Prod it produces that error: image

rosa commented 2 months ago

Huh, ok, that's unexpected and changed what I thought this was about 😅 😅 Did you get any errors of other kind before the lock errors started?

danielschnee commented 2 months ago

Ok. Found another thing, It happens after some time running.

Now i get the same error on local: image

rosa commented 2 months ago

Aha! Now everything makes sense 😆 Ok, I'll get this fixed.