rails / mission_control-jobs

Dashboard and Active Job extensions to operate and troubleshoot background jobs
MIT License
539 stars 59 forks source link

Retry from dashboard does not work #122

Closed collimarco closed 3 months ago

collimarco commented 4 months ago

We have a Job listed in "Failed jobs" with this error:

Errno::ECONNREFUSED
Failed to open TCP connection to :80 (Connection refused - connect(2) for nil port 80)

It is clearly a temporary network error.

The job has this retry policy:

retry_on StandardError, wait: 5.minutes, attempts: 3

I also see this status for the retries (in the section "Raw data"):

    "executions": 2,
    "exception_executions": {
      "[StandardError]": 2
    },

The problem is that when I click "Retry" in the dashboard it seems that the Job is not even retried and after a few seconds it's immediately sent to "Failed jobs" again.

The executions count is not incremented (does not change) and from the time it takes to execute the Job I am pretty sure that it is not even trying to execute it again.

Is this by design when you reach the max number of attempts? (In this case it would be better to clarify this behavior in the dashboard)

Or is it a bug?

rosa commented 4 months ago

Hey @collimarco, sorry for the delay! That sounds like a bug, the job should be retried. Definitely not by design. Would you be able to get some logs of the request to retry to see if that gives us a clue of what's going on?

collimarco commented 4 months ago

@rosa Yes, this is the stack trace:

Errno::ECONNREFUSED
Failed to open TCP connection to :80 (Connection refused - connect(2) for nil port 80)

/usr/local/lib/ruby/3.3.0/net/http.rb:1603:in `initialize'
/usr/local/lib/ruby/3.3.0/net/http.rb:1603:in `open'
/usr/local/lib/ruby/3.3.0/net/http.rb:1603:in `block in connect'
/usr/local/lib/ruby/3.3.0/timeout.rb:186:in `block in timeout'
/usr/local/lib/ruby/3.3.0/timeout.rb:193:in `timeout'
/usr/local/lib/ruby/3.3.0/net/http.rb:1601:in `connect'
/usr/local/lib/ruby/3.3.0/net/http.rb:1580:in `do_start'
/usr/local/lib/ruby/3.3.0/net/http.rb:1569:in `start'
/usr/local/lib/ruby/3.3.0/net/http.rb:2297:in `request'
/usr/local/bundle/gems/httparty-0.21.0/lib/httparty/request.rb:156:in `perform'
/usr/local/bundle/gems/httparty-0.21.0/lib/httparty.rb:612:in `perform_request'
/usr/local/bundle/gems/httparty-0.21.0/lib/httparty.rb:526:in `get'
/usr/local/bundle/gems/httparty-0.21.0/lib/httparty.rb:645:in `get'
/app/app/models/spam_filter.rb:122:in `block in spam_report_with_aws_rekognition'
/app/app/models/spam_filter.rb:121:in `each'
/app/app/models/spam_filter.rb:121:in `spam_report_with_aws_rekognition'
/app/app/models/spam_filter.rb:40:in `run_spam_report'
/app/app/models/spam_filter.rb:11:in `report'
/app/app/models/email.rb:85:in `update_spam_report'
/app/app/jobs/email_processing_job.rb:7:in `perform'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/execution.rb:68:in `block in _perform_job'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:121:in `block in run_callbacks'
/usr/local/bundle/gems/i18n-1.14.1/lib/i18n.rb:322:in `with_locale'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/translation.rb:9:in `block (2 levels) in <module:Translation>'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `instance_exec'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `block in run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/core_ext/time/zones.rb:65:in `use_zone'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/timezones.rb:9:in `block (2 levels) in <module:Timezones>'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `instance_exec'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `block in run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:141:in `run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/execution.rb:67:in `_perform_job'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/instrumentation.rb:32:in `_perform_job'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/execution.rb:51:in `perform_now'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/instrumentation.rb:26:in `block in perform_now'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activerecord/lib/active_record/railties/job_runtime.rb:13:in `block in instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/instrumentation.rb:40:in `block in instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/notifications.rb:206:in `block in instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/notifications.rb:206:in `instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/instrumentation.rb:39:in `instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activerecord/lib/active_record/railties/job_runtime.rb:11:in `instrument'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/instrumentation.rb:26:in `perform_now'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/logging.rb:18:in `block in perform_now'
/usr/local/bundle/gems/logstash-logger-0.26.1/lib/logstash-logger/tagged_logging.rb:5:in `block in tagged'
/usr/local/bundle/gems/logstash-logger-0.26.1/lib/logstash-logger/tagged_logging.rb:16:in `tagged'
/usr/local/bundle/gems/logstash-logger-0.26.1/lib/logstash-logger/tagged_logging.rb:5:in `tagged'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/broadcast_logger.rb:240:in `method_missing'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/logging.rb:25:in `tag_logger'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/logging.rb:18:in `perform_now'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/execution.rb:29:in `block in execute'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:121:in `block in run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/railtie.rb:67:in `block (4 levels) in <class:Railtie>'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/reloader.rb:77:in `block in wrap'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/execution_wrapper.rb:88:in `wrap'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/reloader.rb:74:in `wrap'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/railtie.rb:66:in `block (3 levels) in <class:Railtie>'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `instance_exec'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:130:in `block in run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/callbacks.rb:141:in `run_callbacks'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activejob/lib/active_job/execution.rb:27:in `execute'
/usr/local/bundle/gems/solid_queue-0.2.1/app/models/solid_queue/claimed_execution.rb:61:in `execute'
/usr/local/bundle/gems/solid_queue-0.2.1/app/models/solid_queue/claimed_execution.rb:37:in `perform'
/usr/local/bundle/gems/solid_queue-0.2.1/lib/solid_queue/pool.rb:23:in `block (2 levels) in post'
/usr/local/bundle/bundler/gems/rails-ed84c0d8b001/activesupport/lib/active_support/execution_wrapper.rb:92:in `wrap'
/usr/local/bundle/gems/solid_queue-0.2.1/lib/solid_queue/app_executor.rb:7:in `wrap_in_app_executor'
/usr/local/bundle/gems/solid_queue-0.2.1/lib/solid_queue/pool.rb:22:in `block in post'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `block in synchronize'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:48:in `synchronize'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:22:in `execute'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/ivar.rb:170:in `safe_execute'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/future.rb:55:in `block in execute'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:359:in `run_task'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:350:in `block (3 levels) in create_worker'
<internal:kernel>:187:in `loop'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `block (2 levels) in create_worker'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `catch'
/usr/local/bundle/gems/concurrent-ruby-1.2.3/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `block in create_worker'
rosa commented 4 months ago

Hey @collimarco thanks a lot for that! I meant the logs for the request to retry that job, that is, the request to:

POST /mission_control_jobs/applications/:application_id/jobs/:job_id/retry?server_id=solid_queue

or similar route in your app.

collimarco commented 4 months ago

@rosa This is the only related POST request that I can see from the logs:

[302] POST /jobs/applications/myapp/jobs/7a386611-4d35-475f-b5b6-31e90bcd9c82/retry (MissionControl::Jobs::RetriesController#create)

So the request seems successful. I also see a green banner saying that retry has been scheduled successfully when I click the "Retry" button. I also see that the badge on "In progress jobs" tab is incremented by 1... However as soon as I refresh the page, even after 1 second, the job is in "Failed jobs" again (and executions count is not changed).

I don't see any other relevant logs

rosa commented 3 months ago

@collimarco, I think this is a bug in Solid Queue 🤔 I assume you're using that and not resque, right? I'll look into it.

collimarco commented 3 months ago

@rosa Yes, I am using Solid Queue

rosa commented 3 months ago

@collimarco, thanks again! I'll ship a new version of Solid Queue later today, when I include a couple of other fixes, and then you can simply update to get this one fixed.

rosa commented 3 months ago

Ok, version 0.3.3 of Solid Queue released. Let me know if the issue still happens with that version 🙏