jondot / sneakers

A fast background processing framework for Ruby and RabbitMQ
https://github.com/jondot/sneakers
MIT License
2.24k stars 333 forks source link

Using up disk space #481

Open KLForsythe opened 4 months ago

KLForsythe commented 4 months ago

Following updating ruby to 2.7.7 (first step before updating to ruby 3) and ubuntu to 22.04, I am experiencing problems with my disk filling up, and have tracked this down to sneakers. Note that the ruby update does cause a deprecation warning for syntax deprecated in ruby 3 in our application that has caused our log files to be larger than usual - but I've adjusted logrotate for that, and those changes are keeping the sizes reasonable. It isn't our application logs that are filling up the disk space. It is something that clears the disk space when sneakers is restarted.

Here is what I know so far:

df shows 14G used

deployer@ip-x-x-x-x:/$ sudo df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         29G   14G   16G  47% /
tmpfs            3.9G     0  3.9G   0% /dev/shm
tmpfs            1.6G  960K  1.6G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15  105M  6.1M   99M   6% /boot/efi
tmpfs            784M  4.0K  784M   1% /run/user/1000

du show 7.5G used

deployer@ip-x-x-x-x:/$ sudo du -sh
7.5G

After seeing this: https://serverfault.com/a/581521 in which apache caused a similar problem for someone, I tried restarting various services running. Restarting sneakers cleared 8.4G

deployer@ip-x-x-x-x:/$ sudo service sneakers restart
deployer@ip-x-x-x-x:/$ sudo service sneakers status
● sneakers.service - SYSV: Starts and Stops Sneakers message processor.
     Loaded: loaded (/etc/init.d/sneakers; generated)
     Active: active (exited) since Wed 2024-02-14 14:39:23 UTC; 40s ago
       Docs: man:systemd-sysv-generator(8)
    Process: 128372 ExecStart=/etc/init.d/sneakers start (code=exited, status=0/SUCCESS)
        CPU: 37ms

deployer@ip-x-x-x-x  systemd[1]: Starting SYSV: Starts and Stops Sneakers message processor....
deployer@ip-x-x-x-x  sneakers[128372]: Starting sneakers message processor ..
deployer@ip-x-x-x-x  su[128378]: pam_unix(su-l:session): session opened for user deployer(uid=1000) by (uid=0)
deployer@ip-x-x-x-x  systemd[1]: Started SYSV: Starts and Stops Sneakers message processor..
deployer@ip-x-x-x-x: /$ sudo df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         29G  5.6G   24G  20% /
tmpfs            3.9G     0  3.9G   0% /dev/shm
tmpfs            1.6G  960K  1.6G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15  105M  6.1M   99M   6% /boot/efi
tmpfs            784M  4.0K  784M   1% /run/user/1000

So restarting sneakers cleared 8.4 GB of diskspace on my server. Prior to this, sneakers had been running for 2 days (when I had done an instance refresh for our servers).

We are running sneakers 2.7.0, with bunny 2.9.2.

sneakers.rb

require 'sneakers'

connection = Bunny.new(Settings.rabbit.to_h)

Sneakers.configure(
                   heartbeat: 5,
                   connection: connection,
                   vhost: '/',
                   exchange: 'sneakers',
                   exchange_type: :direct,
                   arguments: { :'x-queue-type' => 'quorum' },
                   daemonize: true,
                   start_worker_delay: 10,
                   pid_path: tmp/pids/sneakers.pid
                   log: log/sneakers.log
                   timeout_job_after: 30,
                   prefetch: 10,
                   threads: 10,
                   env: Rails.env
                )

Sneakers.logger.level = Logger::INFO 
Sneakers.error_reporters << proc { |exception, _worker, context_hash| Honeybadger.notify(exception, context_hash) }

settings.yml relevation sections:

rabbit:
  host: <%= ENV['RABBITMQ_HOST'] %>
  port: <%= ENV['RABBITMQ_PORT'] %>
  user: <%= ENV['RABBITMQ_USERNAME'] %>
  password: <%= ENV['RABBITMQ_PASSWORD'] %>
  automatically_recover: true
  tls: true
  tls_cert:  <%= ENV['TLS_CLIENT_CERT'] %>
  tls_key: <%= ENV['TLS_CLIENT_KEY'] %>
  tls_ca_certificate: <%= ENV['TLS_CA_CERT'] %>
  verify_peer: false
  fail_if_no_peer_cert: false
  queue:
    event: <%= "#{Rails.env}_events" %>
    error: <%= "#{Rails.env}_errors" %>

I am not sure what in sneakers is using up the disk space - but something in it or controlled by it (such as bunny) is using space that is cleared on restarting the sneakers service.

If anyone has any insight or suggestions, I'd greatly appreciate it.