rails / solid_queue

Database-backed Active Job backend
MIT License
1.93k stars 125 forks source link

Enqueued jobs do not respect queue priorities #360

Open salmonsteak1 opened 1 month ago

salmonsteak1 commented 1 month ago

Hey there, I'm testing out the queue priorities in solid queue. My queue.yml is as follows:

default: &default
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: [urgent*, semi_urgent*, default*, low*]
      threads: 1
      processes: 1
      polling_interval: 0.1

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

I also have 3 test jobs for each queue priority level - A, B and C:

# Testing a low priority job A
module TestSqJobs
  class LowPriorityJobA < ActiveJob::Base
    self.queue_adapter = :solid_queue
    queue_as :low_A
    def priority
      0
    end

    def perform
      sleep(1)
    end
  end
end

Finally, I execute each job 5 times using the following ruby code:

job_classes = [
      TestSqJobs::SemiUrgentPriorityJobA,
      TestSqJobs::SemiUrgentPriorityJobB,
      TestSqJobs::SemiUrgentPriorityJobC,
      TestSqJobs::UrgentPriorityJobA,
      TestSqJobs::UrgentPriorityJobB,
      TestSqJobs::UrgentPriorityJobC,
      TestSqJobs::DefaultPriorityJobA,
      TestSqJobs::DefaultPriorityJobB,
      TestSqJobs::DefaultPriorityJobC,
      TestSqJobs::LowPriorityJobA,
      TestSqJobs::LowPriorityJobB,
      TestSqJobs::LowPriorityJobC
    ]

    job_classes.each do |job_class|
      5.times do |i|
        begin
          job_class.perform_later
          puts "Enqueued #{job_class} instance #{i + 1}"
        rescue ArgumentError => e
          puts "Skipping #{job_class} due to missing arguments: #{e.message}"
        end
      end
    end

    puts "Successfully enqueued 5 instances of each TestSqJobs job where possible."

I've observed this on mission control, and it seems like not all the jobs with the urgent prefix gets fulfilled first. Here's a screenshot of my observations:

image

We can see that jobs are taking from the default and semi_urgent queue instead of clearing out the job queues with the urgent prefix.

rosa commented 1 month ago

🤔 You're enqueuing the jobs sequentially, no? And the ones enqueued first are these:

      TestSqJobs::SemiUrgentPriorityJobA,
      TestSqJobs::SemiUrgentPriorityJobB,
      TestSqJobs::SemiUrgentPriorityJobC,

So they're going to be run before

      TestSqJobs::UrgentPriorityJobA,
      TestSqJobs::UrgentPriorityJobB,
      TestSqJobs::UrgentPriorityJobC,

because when they're enqueued, they're the only ones there.

rosa commented 1 month ago

Ah, I see what you mean. Yes, there's a bug there in the order of queues when they use prefixes. The order is the one returned from the DB from this query:

 relation.where(([ "queue_name LIKE ?" ] * prefixes.count).join(" OR "), *prefixes).distinct(:queue_name).pluck(:queue_name)

Which doesn't need to match the order in the LIKE clause. These should be reordered to match the order in the clause and the config.

This should work fine if you don't use prefixes, though (which is what I'd recommend in this case, anyway, collapsing all your low_, urgent_, etc. queues into one). I'll fix the case with prefixes.

salmonsteak1 commented 1 month ago

I see, thanks for the info @rosa! I was thinking to use wildcards together with the priority keywords so that I still can have separate queues for specific jobs while still having the benefit of using queue priorities.

Can I confirm what will be the scheduling algorithm used within jobs that match the same wildcard queue name? So, for example my queue.yml file would have queues defined as: [urgent*, semi_urgent*, default*, low*], and I have urgent_A, urgent_B and urgent_C jobs enqueued. Will the selection of these jobs be FCFS? Or would it be by random selection? Thanks!

rosa commented 1 month ago

Within the same wildcard name, urgent_A, urgent_B, urgent_C, the selection would be non-deterministic, it'll depend on the order the DB returns the queue names from SELECT queue_name FROM solid_queue_ready_executions WHERE queue_name LIKE 'urgent_%').

If you use exact queue names, polling will be faster, and the order will be deterministic.