sul-dlss / preservation_catalog

Rails application to track, audit and replicate archival artifacts associated with SDR objects.
https://sul-dlss.github.io/preservation_catalog/
Other
2 stars 2 forks source link

investigate worker allocation options in Sidekiq with respect to queue prioritization/reservation #1970

Closed ndushay closed 1 year ago

ndushay commented 1 year ago

From maint/tech debt storytime discussion Wed 10/19:

As @jmartin-sul has observed, one hard thing is controlling allocation of specific amounts of workers to specific queues. Resque-pool makes that easy, but also makes automatic retries hard (needs a plugin).

John has concern about worker allocation; @justinlittman thinks upcoming version of sidekiq may allow for this.

ACTIONS FOR TICKET:

jmartin-sul commented 1 year ago

resque-pool allows us to allocate a specific number of workers to specific queues. As I understand it, Sidekiq lets you allocate one pool of workers across all queues, with prioritization for which queue gets worked first when a worker becomes free (but no easy facility to reserve workers for specific queues in the same way as resque-pool). But it'd be great if that's changed in the new version of Sidekiq.

While the resque-pool approach does lead to less efficient utilization of allocated resources (resources are allocated for a peak that's much higher than the average at any moment), it also helps minimize situations like accesisonWF/preservationIngestWF backing up because no worker threads are free to pick up validate-moab jobs because they all happen to be occupied running checksum validation on ~1 TB media objects. That's an unusual situation, but not unheard of, and it'd be nice to avoid it. Similar worries for backup of the replication pipeline behind other work.

We've also arrived at pretty different maximums for the number of workers we want running at a time for some I/O intensive tasks, both because of what seem to be the usual internet and storage bandwidth limits, and because we're happy with lower throughput and less resource competition for e.g. MoabToCatalog when compared to more urgent tasks like archive zip delivery or validation of versions in accessioning.

Happy to pair, or answer questions in the ticket, or on Slack.

jmartin-sul commented 1 year ago

ok, so quick update after reading the link in the description, and doing some web searching and documentation browsing.

i don't see anything at the link (or the release notes in the 7 beta tag) about the upcoming sidekiq version that indicates that worker allocation possibilities will change. but, after a bit of searching, i found:

https://github.com/mperham/sidekiq/issues/1960

Ah, that’s right. Thanks a lot.

I think the Advanced Options - Queues section could be improved to make it obvious that you can start two workers aside each with specific queues, eg:

   sidekiq -c 8 -q default
   sidekiq -c 4 -q critical

https://github.com/mperham/sidekiq/wiki/Advanced-Options#reserved-queues

Reserved Queues

If you'd like to "reserve" a queue so it only handles certain jobs, the easiest way is to run two sidekiq processes, each handling different queues:

sidekiq -q critical # Only handles jobs on the "critical" queue
sidekiq -q default -q low -q critical # Handles critical jobs only after checking for other jobs

So, it's possible that Sidekiq supports what we need already. Though we might have to do some capistrano and/or puppet work to pass through configuration? will we basically be writing dlss-sidekiq-pool?

jmartin-sul commented 1 year ago

i think the work for this ticket is something like:

i looked briefly for something like a sidekiq-pool gem, and i found: https://github.com/vinted/sidekiq-pool

but i haven't looked closely at it, and have no idea whether it does exactly what we want, whether it'd be easier than doing it ourselves, or how well maintained it is.

jmartin-sul commented 1 year ago

in case it's useful for concrete experimentation in this ticket or a follow-on, example of the switch in pre-assembly: https://github.com/sul-dlss/pre-assembly/pull/927

justinlittman commented 1 year ago

If my understanding is correct, Capsules will provide the necessary functionality: https://github.com/mperham/sidekiq/blob/main/docs/7.0-Upgrade.md#capsules

The reserved queue approach is viable as well. However, this would require a bit of puppet work, as our current puppet sidekiq configuration doesn't provide a hook for this. A more viable alternative is to switch to docker for workers, where this could be easily accommodated in the docker-compose.yml.

jmartin-sul commented 1 year ago

If my understanding is correct, Capsules will provide the necessary functionality: https://github.com/mperham/sidekiq/blob/main/docs/7.0-Upgrade.md#capsules

The reserved queue approach is viable as well. However, this would require a bit of puppet work, as our current puppet sidekiq configuration doesn't provide a hook for this. A more viable alternative is to switch to docker for workers, where this could be easily accommodated in the docker-compose.yml.

oh nice, thanks @justinlittman! i hadn't found that release notes doc, the link from the post in the description was broken, and i must've missed it when browsing the repo.

this capsules feature does seem like a very promising approach, and lower effort than either the puppet work for reserved queues or dockerization (though i definitely wouldn't be opposed to dockerization, since it'd have other benefits).

justinlittman commented 1 year ago

Open further consideration, capsules does not solve the problem. Capsules allows controlling the threads assigned to a queue for a single worker. However, we run multiple workers so the number of threads is multiplied by the number of workers, which doesn't provide the necessary control.

An alternative approach is https://github.com/sul-dlss/operations-tasks/issues/3209, which allows a separate configuration file per worker. This should allow good-enough control using Sidekiq 6.

jmartin-sul commented 1 year ago

ACTIONS FOR TICKET:

i believe these are all done: