Closed jdelStrother closed 1 year ago
hello @jdelStrother Thanks for letting us know about this issue.
Following your instructions and using the code provided, I was able to replicate some weird behavior, so this does appear to be something we'll want to take a closer look at.
When I was running it, it would output
Firing event report_data every 6 seconds.
Firing event report_event_data every 6 seconds.
Firing event reset_log_once_keys every 6.0 seconds.
2021-07-02 12:49:27 -0700 Fetching count failed_at is not NULL
2021-07-02 12:49:27 -0700 Count: 0
2021-07-02 12:49:27 -0700 Fetching count locked_by is not NULL
2021-07-02 12:49:27 -0700 Count: 0
2021-07-02 12:49:33 -0700 Fetching count failed_at is not NULL
and then be deadlocked and unable to ctrl-c out of it.
This appears to be a long standing issue, as I also tried with some older agent versions and saw the same behavior. This was surprising to me since we haven't received any other reports of this issue.
I'll put this in our next bug smash, so that when we get started on the bug smash we'll take a closer look at solving this issue. Thanks for letting us know!
Hi @jdelStrother, @tannalynn and I took a look at this today.
We're still narrowing down the cause of the issue, however, we noticed in our playground environment that the deadlock went away when updating our configuration to disable the DelayedJob sampler:
# newrelic.yml
disable_delayed_job_sampler: true
This is a potential temporary workaround until we identify the root cause. Thanks for your patience!
Hi @jdelStrother, we may have a fix for the deadlock issue on this branch. Feel free to update your Gemfile with the following if you'd like to see if it helps:
gem 'newrelic_rpm', :git => 'https://github.com/newrelic/newrelic-ruby-agent.git', :branch => 'bugfix/delayed-job-deadlocks'
Sorry for the slow reply, I've been away for a bit. I've been running this for 30 minutes or so which previously has been more than enough to trigger the bug with my hack to increase the sampling rate, and it seems fixed 🎉
Will continue running it tomorrow and let you know if I manage to get a deadlock
Ugh, spoke too soon. I left it running overnight & got a deadlock after 3 hours or so. 😞
... similar backtraces again:
Main thread:
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `enter'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `block in mon_enter'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:17:in `mon_enter'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:22:in `block in synchronize'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:597:in `checkin'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:551:in `block (3 levels) in clear_reloadable_connections'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:548:in `each'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:548:in `block (2 levels) in clear_reloadable_connections'
lib/ruby/2.7.0/monitor.rb:202:in `synchronize'
lib/ruby/2.7.0/monitor.rb:202:in `mon_synchronize'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:547:in `block in clear_reloadable_connections'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:749:in `block in with_exclusively_acquired_all_connections'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:824:in `with_new_connections_blocked'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:747:in `with_exclusively_acquired_all_connections'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:546:in `clear_reloadable_connections'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:570:in `clear_reloadable_connections!'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:1093:in `each'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:1093:in `clear_reloadable_connections!'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_handling.rb:351:in `clear_reloadable_connections!'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/railtie.rb:239:in `block (3 levels) in <class:Railtie>'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:427:in `instance_exec'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:427:in `block in make_lambda'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:198:in `block (2 levels) in halting'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:604:in `block (2 levels) in default_terminator'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:603:in `catch'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:603:in `block in default_terminator'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:199:in `block in halting'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `block in invoke_before'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `each'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `invoke_before'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:105:in `run_callbacks'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:120:in `class_unload!'
lib/ruby/gems/2.7.0/gems/railties-6.1.4.1/lib/rails/application/finisher.rb:244:in `block (2 levels) in <module:Finisher>'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:427:in `instance_exec'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:427:in `block in make_lambda'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:198:in `block (2 levels) in halting'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:604:in `block (2 levels) in default_terminator'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:603:in `catch'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:603:in `block in default_terminator'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:199:in `block in halting'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `block in invoke_before'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `each'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:512:in `invoke_before'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/callbacks.rb:105:in `run_callbacks'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/execution_wrapper.rb:111:in `run!'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:114:in `run!'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:53:in `block (2 levels) in reload!'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:52:in `tap'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:52:in `block in reload!'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/execution_wrapper.rb:88:in `wrap'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/reloader.rb:51:in `reload!'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/worker.rb:330:in `reload!'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/worker.rb:187:in `block (2 levels) in start'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/worker.rb:172:in `loop'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/worker.rb:172:in `block in start'
RAILS_ROOT/lib/delayed_job/logging.rb:42:in `block (3 levels) in <class:TaggedLogging>'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/tagged_logging.rb:99:in `block in tagged'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/tagged_logging.rb:37:in `tagged'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/tagged_logging.rb:99:in `tagged'
RAILS_ROOT/lib/delayed_job/logging.rb:41:in `block (2 levels) in <class:TaggedLogging>'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:79:in `block (2 levels) in add'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/plugins/clear_locks.rb:7:in `block (2 levels) in <class:ClearLocks>'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:79:in `block (2 levels) in add'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:61:in `block in initialize'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:79:in `block in add'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:79:in `block in add'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:66:in `execute'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/lifecycle.rb:40:in `run_callbacks'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/worker.rb:171:in `start'
lib/ruby/gems/2.7.0/gems/delayed_job-4.1.9/lib/delayed/tasks.rb:9:in `block (2 levels) in <main>'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute'
lib/ruby/gems/2.7.0/gems/sentry-ruby-core-4.7.3/lib/sentry/rake.rb:23:in `block in execute'
lib/ruby/gems/2.7.0/gems/sentry-ruby-core-4.7.3/lib/sentry/hub.rb:158:in `with_background_worker_disabled'
lib/ruby/gems/2.7.0/gems/sentry-ruby-core-4.7.3/lib/sentry/rake.rb:22:in `execute'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:83:in `block in run'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:80:in `run'
bin/rake:6:in `<main>'
Agent thread:
lib/ruby/2.7.0/monitor.rb:108:in `sleep'
lib/ruby/2.7.0/monitor.rb:108:in `wait'
lib/ruby/2.7.0/monitor.rb:108:in `wait_for_cond'
lib/ruby/2.7.0/monitor.rb:108:in `wait'
lib/ruby/2.7.0/monitor.rb:116:in `wait_while'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/share_lock.rb:220:in `wait_for'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/share_lock.rb:190:in `block in yield_shares'
lib/ruby/2.7.0/monitor.rb:202:in `synchronize'
lib/ruby/2.7.0/monitor.rb:202:in `mon_synchronize'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/share_lock.rb:189:in `yield_shares'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:205:in `block in execute'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:696:in `block (2 levels) in log'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:695:in `block in log'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_adapter.rb:687:in `log'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:204:in `execute'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/mysql/database_statements.rb:50:in `execute'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:215:in `execute_and_free'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/mysql/database_statements.rb:55:in `exec_query'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:532:in `select'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/database_statements.rb:69:in `select_all'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/query_cache.rb:103:in `select_all'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/mysql/database_statements.rb:12:in `select_all'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/calculations.rb:308:in `block in execute_simple_calculation'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation.rb:861:in `skip_query_cache_if_necessary'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/calculations.rb:308:in `execute_simple_calculation'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/calculations.rb:269:in `perform_calculation'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/calculations.rb:144:in `calculate'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/instrumentation/active_record_prepend.rb:85:in `block in calculate'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent.rb:682:in `with_database_metric_name'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/instrumentation/active_record_prepend.rb:84:in `calculate'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/calculations.rb:51:in `count'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/samplers/delayed_job_sampler.rb:56:in `count'
RAILS_ROOT/lib/delayed_job_hacks.rb:45:in `block in count'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_silence.rb:18:in `block in silence'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_thread_safe_level.rb:52:in `log_at'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_silence.rb:18:in `silence'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger.rb:64:in `block (3 levels) in broadcast'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_silence.rb:18:in `block in silence'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_thread_safe_level.rb:52:in `log_at'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger_silence.rb:18:in `silence'
lib/ruby/gems/2.7.0/gems/activesupport-6.1.4.1/lib/active_support/logger.rb:62:in `block (2 levels) in broadcast'
RAILS_ROOT/lib/delayed_job_hacks.rb:44:in `count'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/samplers/delayed_job_sampler.rb:45:in `failed_jobs'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/samplers/delayed_job_sampler.rb:67:in `block in poll'
lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:462:in `with_connection'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/samplers/delayed_job_sampler.rb:66:in `poll'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/sampler_collection.rb:41:in `block in poll_samplers'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/sampler_collection.rb:39:in `delete_if'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/sampler_collection.rb:39:in `poll_samplers'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/sampler_collection.rb:13:in `block in initialize'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_listener.rb:40:in `block in notify'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_listener.rb:38:in `each'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_listener.rb:38:in `notify'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:1143:in `transmit_data'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:619:in `block in create_and_run_event_loop'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_loop.rb:148:in `block in dispatch_event'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_loop.rb:146:in `each'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_loop.rb:146:in `dispatch_event'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_loop.rb:111:in `run_once'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/event_loop.rb:99:in `run'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:641:in `create_and_run_event_loop'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:701:in `block (2 levels) in deferred_work!'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent.rb:502:in `disable_all_tracing'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:698:in `block in deferred_work!'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:678:in `catch_errors'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:697:in `deferred_work!'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/agent.rb:724:in `block in start_worker_thread'
lib/ruby/gems/2.7.0/bundler/gems/newrelic-ruby-agent-07ee4cc80f48/lib/new_relic/agent/threading/agent_thread.rb:14:in `block in create'
Thank you, @jdelStrother for running a thorough test on the branch! We'll get back to work on a solution. Stay tuned!
@jdelStrother - There are references in the latest stacktrace to some files directly within RAILS_ROOT. If you do not feel comfortable sharing the code, could you share some context about what's happening in the following areas:
count
methodTaggedLogging
class Another new element I see is sentry-ruby-core
. Does the application use any custom configuration for sentry? Does it use the integrations for sentry-rails
or sentry-delayed-job
? All three of these gems may modify the same classes newrelic_rpm
modifies, so it could be helpful to see the ancestors chain to peek at the order of operations.
Many thanks for your collaboration!
Two more questions/ideas:
There are references in the latest stacktrace to some files directly within RAILS_ROOT [...] Another new element I see is sentry-ruby-core
I've disabled some of the RAILS_ROOT code, and removed Sentry, but was still able to reproduce the deadlock:
delayed_job_hacks.rb just contains the hacks to try & speed up the deadlock:
NewRelic::Agent::Samplers::DelayedJobSampler.prepend(
Module.new {
def count(query)
puts "#{Time.now} Fetching count #{query.inspect}..."
t = nil
Rails.logger.silence do
t = super
end
puts "#{Time.now} Count: #{t}"
t
rescue Exception => e # rubocop:disable Lint/RescueException
puts e.inspect
raise
end
}
)
NewRelic::Agent::EventLoop.prepend(
Module.new {
def fire_every(interval, event)
puts "Firing event #{event} every #{interval / 10} seconds."
super(interval / 10, event)
end
}
)
you mention this is occurring in development mode. Does that refer to your development environment/local machine or something else? Do issues come up in other environments?
I only see it for RAILS_ENV=development, presumably because ActiveSupport::Dependencies.interlock is only active when config.cache_class=true.
I created a bare-bones app to try & reduce the problem further: https://github.com/jdelStrother/newrelic-dj-deadlocks
The license key in newrelic.yml needs replacing with a real key[^1], then running rake jobs:work
should hopefully eventually hit the deadlock.
The deadlock takes significantly longer to reproduce with the bugfix/delayed-job-deadlocks branch compared to 8.0.0, but I still hit it occasionally. In the test app (which speeds up sampling 100x) it took maybe half an hour to reach.
[^1]: is there a test key that can be used to sample locally but not necessarily report to NewRelic?
Hi @jdelStrother, thanks for running those tests and sharing the playground! My apologies for the delayed response. I hoped to respond with a fix. Instead, here's an update:
We've narrowed down the source of the deadlock within the gem to this LOC:
# lib/new_relic/agent/delayed_job_sampler.rb
def count(query)
if ::ActiveRecord::VERSION::MAJOR.to_i < 4
::Delayed::Job.count(:conditions => query)
else
=> ::Delayed::Job.where(query).count
end
end
We're investigating ActiveSupport::Dependencies.interlock
, Rails.application.executor.wrap
and the changes made by config.cache_class = true
. It seems like the class caching setting has a lot to do with this error. We've tried various combinations of wrapping DelayedJobSampler#poll
with some of the suggestions made in this issue, with no avail.
Until we track this element down, we'd like to include the changes in the bugfix/delayed-job-deadlocks
branch in our upcoming 8.1.0 release to slow down the deadlock. Ultimately, we hope to find something that eliminates this issue altogether.
Thanks for your continued engagement and support!
Hello @jdelStrother, thank you for helping by providing a bare-bones app and log information for this issue. We truly appreciate it. We tried to resolve this issue for 8.1.0, but unfortunately, we were unable to do so. We will return to this issue when we can pick it up again. If it's becoming pressing, I encourage you to file an issue with New Relic Support. Thank you for your patience.
Closing this issue for now since it only happens in development and a fix to reduce the amount of time until a deadlock has been shipped. Please re-open if this comes up in a non-development environment.
Hi - we’re in the midst of setting up New Relic on our environment and one of our applications in production appears to have a very similar issue as described above in this ticket on agent 8.8.0. We are running DelayedJob via rake jobs:work
in a Docker container and will experience a deadlock generally within a few minutes (!) so long as New Relic is active.
Interestingly, we are also not seeing any of the successful jobs actually get submitted to the New Relic dashboard for the app. As seen above, there’s very little in logs, but our rb_backtrace()
for the main thread looks very similar to what’s been reported already.
@hedgeye-fslezak Thanks for letting us know that you're encountering a similar issue to this one. For this particular issue, the deadlock was only seen in development environments, but was not seen in production due to differences in settings between prod/dev environments. Right now, I'm not sure if it's the same issue as this or just one that looks similar. One way this issue was able to be prevented was to disable the delayed job samplers (which caused the problem) by adding the following to the newrelic.yml:
# newrelic.yml
disable_delayed_job_sampler: true
If this is the same issue, setting that config should prevent the deadlock from occurring.
One thing that's interesting, is that I don't believe we saw the same behavior you describe about the successful jobs not appearing on the UI. Are you using infinite tracing? In 8.9.0 we released a bugfix for infinite tracing that would prevent data from being sent to the new relic servers. If you aren't using infinite tracing, that bugfix wouldn't be relevant to the situation though.
One thing to note is that data related to a transaction will not be sent until the transaction is complete. I have seen jobs in the past structured in such a way that it was just one transaction running indefinitely, which caused the agent to be unable to send the transaction data to the new relic servers. I'm not sure if that is relevant to your particular situation since it isn't common, but I wanted to bring it up because I have seen that before from someone using delayed job.
Enabling debug level logging for the agent and reviewing the logs may be able to provide additional information about what is going on behind the scenes in the agent.
# newrelic.yml
log_level: debug
One of the first things I'd check in the agent logs would be to look for any errors or warnings, and then lines related to communicating to the new relic server to see if there are any issues in the response from the collector, these tend to look like this
DEBUG : Sending TransactionEvent data to New Relic Service
DEBUG : Sampled 4 / 4 (100.0 %) requests this cycle, 59 / 59 (100.0 %) since startup
DEBUG : Sending 4 items to analytic_event_data
DEBUG : Sending request to <collector url> with POST
DEBUG : Received response, status: 202, encoding: ''
Hopefully some of this helps, please let me know if you have any questions.
Hi! Thank you for the response, and the information about that config flag. We will be testing that more this week, rather than our current heavy-handed solution of just disabling the agent entirely on our worker nodes.
On our environment (staging), we did manage to get some information out of the debug log, most notably that the deadlock appears to happen on the 60_second_harvest
event:
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : EventLoop: Dispatching event '5_second_harvest' with 4 callback(s).
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending TransactionEvent data to New Relic Service
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sampled 0 / 0 (NaN %) requests this cycle, 0 / 0 (NaN %) since startup
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending CustomEvent data to New Relic Service
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending ErrorEvent data to New Relic Service
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending LogEvent data to New Relic Service
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending 7 items to log_event_data
[2022-09-19 16:30:24 -0400 b91b1743c2ff (135)] DEBUG : Sending request to <redacted> with POST
[2022-09-19 16:30:25 -0400 b91b1743c2ff (135)] DEBUG : Received response, status: 202, encoding: ''
[2022-09-19 16:30:29 -0400 b91b1743c2ff (135)] DEBUG : EventLoop: Dispatching event '60_second_harvest' with 1 callback(s).
[2022-09-19 16:30:29 -0400 b91b1743c2ff (135)] DEBUG : Sending data to New Relic Service
We did have a couple earlier runs of 60_second_harvest
succeed in this case, which seemed to actually report what we expected:
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : EventLoop: Dispatching event '60_second_harvest' with 1 callback(s).
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending data to New Relic Service
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending 51 items to metric_data
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending request to [redacted] with POST
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Received response, status: 202, encoding: ''
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending 7 items to log_event_data
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending request to [redacted] with POST
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Received response, status: 202, encoding: ''
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending request to [redacted] with POST
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Received response, status: 200, encoding: ''
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Received get_agent_commands = []
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : No thread profiles with data found to send.
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : EventLoop: Dispatching event 'reset_log_once_keys' with 1 callback(s).
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : EventLoop: Dispatching event '5_second_harvest' with 4 callback(s).
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending TransactionEvent data to New Relic Service
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sampled 0 / 0 (NaN %) requests this cycle, 0 / 0 (NaN %) since startup
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending CustomEvent data to New Relic Service
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending ErrorEvent data to New Relic Service
[2022-09-19 16:29:29 -0400 b91b1743c2ff (135)] DEBUG : Sending LogEvent data to New Relic Service
I'm still not exactly sure why jobs didn't report earlier as they are working on a separate test; I suspect it was just bad luck and the deadlock happened during submission (?). Hopefully the above can be of some value to the New Relic team as well. Thanks for your support!
@hedgeye-fslezak - Thanks for sharing your debug logs! We'll keep an eye out for your update on your test with the config flag toggle. 😄
Hiya, update on our end. We managed to find our problem - it was indeed related to a stray config.cache_class = true
caused by us running Spring in production (oops). Removing spring and ensuring cache_classes
was disabled cleared the problem up and stopped the deadlocks, so that's at least a workable solution.
Thank you all for your support!
Fantastic! Thanks for the update! I'm glad you found a solution. 🎉
I'm going to close this ticket now, but please feel free to comment on this issue or create a new one if the problem returns.
Description
When running
rake jobs:work
in development mode with a newrelic-enabled app, the main thread & the newrelic sampling thread eventually deadlock each other. No more jobs get run, and the process becomes unable to exit on ctrl-c.Expected Behavior
No deadlocks! They're not usually expected.
Troubleshooting or NR Diag results
Provide any other relevant log data.
Here's the main thread following a deadlock:
and NewRelic's agent thread:
Steps to Reproduce
I found this helpful for narrowing down the problem:
which logs before and after the DJ sampling occurs, and makes the sampling more frequent so you don't have to wait hours for it to occur. When the deadlock occurs you'll see something like
in the log, followed by no other output.
Your Environment
Newrelic agent 7.1.0 Ruby 2.7.3 Rails 6.1.4 macOS 11.4
Additional context
I wonder if the DJ sampling needs wrapping in Rails.application.executor, but not totally sure.