jondot / sneakers

A fast background processing framework for Ruby and RabbitMQ
https://github.com/jondot/sneakers
MIT License
2.24k stars 333 forks source link

Sneaker close connection with Rabbit after hours inactivity, #456

Open luismiv85 opened 3 years ago

luismiv85 commented 3 years ago

Hi community!

Some time ago I leave an issue about this problem (https://github.com/jondot/sneakers/issues/440), but I could not give a clear information or sufficient information.

I have been collect more information about this issue through rabbit and sneaker logs.

Bunny: 2.14.3 Sneaker: 2.12.0

### sneakers.log:

2020-10-06T06:38:52Z p-13846 t-gth45x9es ERROR: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error)
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN: Exception in the reader loop: Timeout::Error: IO timeout when reading 7 bytes
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN: Backtrace:
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/cruby/socket.rb:65:in `rescue in read_fully'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/cruby/socket.rb:53:in `read_fully'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/transport.rb:239:in `read_fully'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/transport.rb:261:in `read_next_frame'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/reader_loop.rb:74:in `run_once'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/reader_loop.rb:39:in `block in run_loop'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/reader_loop.rb:36:in `loop'
2020-10-06T06:38:52Z p-13846 t-gth45x9es WARN:  /home/deploy/sites/project/shared/bundle/ruby/2.5.0/gems/bunny-2.14.3/lib/bunny/reader_loop.rb:36:in `run_loop'

### rabbitMQ log:

=WARNING REPORT==== 6-Oct-2020::08:38:52 ===
closing AMQP connection <0.25062.30> (127.0.0.1:57706 -> 127.0.0.1:5672, vhost: '/', user: 'xxxx'):
client unexpectedly closed TCP connection

=ERROR REPORT==== 6-Oct-2020::08:43:02 ===
closing AMQP connection <0.18400.30> (127.0.0.1:44312 -> 127.0.0.1:5672):
missed heartbeats from client, timeout: 30s

=ERROR REPORT==== 6-Oct-2020::08:43:18 ===
closing AMQP connection <0.18356.30> (127.0.0.1:44262 -> 127.0.0.1:5672):
missed heartbeats from client, timeout: 30s

=ERROR REPORT==== 6-Oct-2020::08:43:30 ===
closing AMQP connection <0.20717.30> (127.0.0.1:47182 -> 127.0.0.1:5672):
missed heartbeats from client, timeout: 30s

=ERROR REPORT==== 6-Oct-2020::08:43:45 ===
closing AMQP connection <0.20767.30> (127.0.0.1:47550 -> 127.0.0.1:5672):
missed heartbeats from client, timeout: 30s

On the other hand, my config sneaker file (initialize/sneakers.rb):

Sneakers.configure  heartbeat: 30,
                    daemonize: true,
                    amqp: rabbitmq_credentials[:amqp_address],
                    durable: true,
                    vhost: '/',
                    exchange: 'sneakers',
                    exchange_type: :direct,
                    ack: true,
                    log: "#{Rails.root}/log/sneakers.log",
                    pid_path: "#{Rails.root}/tmp/pids/sneakers.pid",
                    timeout_job_after: 50.minutes,
                    metrics: Sneakers::Metrics::LoggingMetrics.new

And every worker override some configuration. worker_1.rb

  from_queue JOB_NAME,
             log: "log/#{JOB_NAME}.log",
             timeout_job_after: 50_000,
             threads: 2,
             workers: 5,
             prefetch: 10

Anyone can help me or have any idea about this error?

Thanks in advance!! Cheers.

vishaltps commented 4 months ago

I am facing the same issue, it works for few hours . After few hours it stopped working, i can't see anything in the log, i have to delete the existing pod so it starts again

KinWang-2013 commented 4 months ago

I am facing the same issue, it works for few hours . After few hours it stopped working, i can't see anything in the log, i have to delete the existing pod so it starts again

I am facing the same issue in production sneaker is 2.12.0, bunny is 2.22.0, ruby is 2.7.8. Cant't really say it is due to inactivity but it works for some time and after some time it stops working with queue count static or increasing and consumer consumer count 0. Any one got a solution or found the root cause?

vishaltps commented 4 months ago

@luismiv85 Have you found any solution for it? Below are versions

Ruby - 3.2.2 Bunny - 2.22.0 Sneakers - 2.12.0

KinWang-2013 commented 4 months ago

My issue is fixed, It was due to the timeout error Consumer 1 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more one job was taking a lot of time hence it did not acknowledge on time.

vishaltps commented 4 months ago

@KinWang-2013 My issue is also resolved, I was using this as config earlier ~ Sneakers.configure( { amqp: ENV.fetch('SNEAKER_URL', 'amqp://guest:guest@localhost:5672'), ~ Now i am using this

module Connection
  def self.sneakers
    @_sneakers ||= begin
      Bunny.new(
        host: ENV.fetch('RABBITMQ_HOST', 'localhost'),
        port: ENV.fetch('RABBITMQ_PORT', 5672).to_i,
        username: ENV.fetch('RABBITMQ_USER', 'guest'),
        password: ENV.fetch('RABBITMQ_PASSWORD', 'guest'),
        automatic_recovery: true,
        heartbeat: 30,
        log_level: :debug,
        log_file: 'log/rabbitmq.log'
      )
    end
  end
end

Sneakers.configure(
  {
    connection: Connection.sneakers,

I am not sure whether its a right way or not but it is working for me.