rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

DB Shards support #353

Open doctomarculescu opened 2 months ago

doctomarculescu commented 2 months ago

Hi,

In https://github.com/rails/solid_queue/pull/56 it seems built in support for database sharding was added to solid_queue.

In my team we were unsuccessful in testing this feature. Maybe we are not setting things up correctly. What we did we set up in the config config.solid_queue.connects_to to point to two shards as in:

config.solid_queue.connects_to = {
    shards: {
      shard_one: { writing: :db5, reading: :db5_readers },
      shard_two: { writing: :db5_shard, reading: :db5_shard_readers }
    }
  }

We tried the configuration above, but the workers and the dispatchers only connected to the first shard. We managed to get the adapter enqueue in both shards with some tweaking, but it did not support it out of the box.

If the sharding is supported out of the box, would you be so kind to point us to a example of DB sharding working out of the box (configuration for adapter, dispatcher and worker)? Or, if the sharding is not fully supported yet, is there any plan to enhance solid queue with full sharding support ?

Many thanks!

Regards, Andrei

rosa commented 2 months ago

Ahh, great question @doctomarculescu! When I first added that, I thought I would work on sharding right away but in the end we didn't need it, so I postponed it 😳 It doesn't work out of the box, no, I'm afraid.

We've been discussing sharding as the next top priority to support, so yes, it's definitely in the plans. I'm not sure when, because my next open-source priority is getting https://github.com/rails/mission_control-jobs to v1.0.0, but hopefully soon.

doctomarculescu commented 2 months ago

Thank you for your answer @rosa. I am glad it's in the plans as a top priority.

Sharding looks indeed like the key feature allowing to achieve high throughputs with horizontal scaling on the worker side. From our testing it turns out that the vertical scaling of the DB only works if the workers are vertically scaled as well. We tested a massive horizontal scaling for workers with 1k instances and vertical scaling of the DB by doubling CPU/memory/network. We hit the same upper bound in terms of throughput for both DB sizes because of the overhead in the worker polling: the more workers poll, the more pages have to be brought in memory to find the first non locked row from ready executions. Sharding would divide by the number of shards the number polling requests and I am expecting excellent speedup of the throughput by the number of shards.

AlessandroTolomio commented 3 weeks ago

Hi @doctomarculescu I am using a structure with database shards, where some tenants use a different database. This was causing issues with implementing queues on a dedicated database, so I wrote a monkey patch that seems to be working. I'm not sure if this might be useful.

module SolidQueue
  class Record < ActiveRecord::Base
    self.abstract_class = true

    if defined?(self.connect_to) && self.respond_to?(:connects_to)
      class << self
        remove_method :connects_to
      end
    end

    def self.non_blocking_lock
      if SolidQueue.use_skip_locked
        lock(Arel.sql("FOR UPDATE SKIP LOCKED"))
      else
        lock
      end
    end

    def self.current_shard
      :queue
    end

    def self.connection
      connection_handler.retrieve_connection_pool(connection_specification_name, role: :default, shard: current_shard).connection
    end
  end
end
thibaudgg commented 2 weeks ago

@AlessandroTolomio thanks for your monkey patch. I wonder if that change would be enough:

# config/initializers/solid_queue_patch.rb
Rails.application.config.to_prepare do
  module CurrentShardPatch
    def current_shard; :queue end
  end

  SolidQueue::Record.send(:extend, CurrentShardPatch)
end

It might be also worth moving that discussion to https://github.com/rails/solid_queue/issues/369.