ixti / sidekiq-throttled

Concurrency and rate-limit throttling for Sidekiq
MIT License
713 stars 77 forks source link

Allow configuring whether throttled jobs are put back on the queue immediately or scheduled for the future #150

Open lavaturtle opened 1 year ago

lavaturtle commented 1 year ago

This adds support for setting the requeue_strategy for a job, to specify what we should do with throttled jobs. The default is :enqueue, which is the current behavior: re-add it to the end of the queue. The other option is :schedule, which schedules the job for a time in the future when we think we'll have capacity to process it.

It's also possible to set the default_requeue_strategy in the configuration, to set this behavior for all jobs that do not individually specify a requeue_strategy.

This may be relevant to Issue #36.

Unrelatedly, there's a commit in here that changes the format of some require_relative lines, to comply with the new Rubocop rule Style/RedundantCurrentDirectoryInPath. I don't feel strongly about this commit; I only added it so that rubocop would pass.

ixti commented 1 year ago

That is awesome! Thank you! I will take a look this weekend.

ixti commented 1 year ago

Re: Style/RedundantCurrentDirectoryInPath

I prefer consistency. It is pretty weird to me that rubocop's default for rescue is to be explicit, while for require_relative it's implicit. Let's just disable this cop. I will find time to open a rubocop PR to make that configurable. But for now, I would like it to be simply disabled.

On a side note about this PR, I think I made a bad decision back when I started this gem on providing custom sidekiq_throttle singleton class message, probably we should utilize sidekiq_class_attribute helper instead:

def self.included(base)
  base.sidekiq_class_attribute :sidekiq_throttle_push_back # :enqueue | :schedule
end

In this case we will not need to bother about "keeping track" of job class inheritance.

lavaturtle commented 1 year ago

Re: Style/RedundantCurrentDirectoryInPath

I prefer consistency. It is pretty weird to me that rubocop's default for rescue is to be explicit, while for require_relative it's implicit. Let's just disable this cop. I will find time to open a rubocop PR to make that configurable. But for now, I would like it to be simply disabled.

Makes sense! I've updated the PR to disable the cop instead of applying it.

lavaturtle commented 1 year ago

On a side note about this PR, I think I made a bad decision back when I started this gem on providing custom sidekiq_throttle singleton class message, probably we should utilize sidekiq_class_attribute helper instead:

Ah, interesting! I didn't know about sidekiq_class_attribute. So if we do that, then a job class can specify sidekiq_throttle_push_back in its sidekiq_options, and then we can read it later with get_sidekiq_options?

Would you prefer push_back as the setting name rather than requeue_strategy?

ixti commented 1 year ago

sidekiq_class_attribute :sidekiq_throttle_push_back will define a new singleton class method, so the usage will look like:

class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttled_push_back :enqueue
end

Naturally, we should check if the value is Proc, and if so - we should call that proc to get the value, so that it will be possible to:

class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttled_push_back ->(user_id, *) { user_id.odd? ? :enqueue : :schedule }

  def perform(user_id, more, stuff)
    # ...
  end
end
ixti commented 1 year ago

I have no strong feelings about push_back vs requeue_strategy. I just don't like over-use of strategy in names (easy to confuse with Throttled::Strategy). Probably simply: sidekiq_throttled_requeue_with?

Also, just realized, that it would be nice to also allow specifying which queue it should be requeued to.

lavaturtle commented 1 year ago

Hmm, I've been working on getting the sidekiq_class_attribute approach to work, and as far as I can tell from my testing, if we do this in Sidekiq::Throttled::Job:

      def self.included(worker)
        worker.send(:extend, ClassMethods)
        worker.sidekiq_class_attribute :sidekiq_throttled_requeue_with  # :enqueue | :schedule
      end

then the job class needs to use it like this:

      class MyThrottledJob
        include Sidekiq::Job
        include Sidekiq::Throttled::Job

        self.sidekiq_throttled_requeue_with = :schedule
        sidekiq_throttle foo: :bar

i.e. it ends up as a class-level variable that needs to be assigned, rather than being something we can use like sidekiq_throttled_requeue_with :schedule.

lavaturtle commented 1 year ago

So maybe we invoke sidekiq_class_attribute for the inheritance benefits, but instead of users using sidekiq_throttled_requeue_with in the class directly, we keep the requeue_with option for sidekiq_throttle? That also avoids any weirdness around which order the two are called in.

lavaturtle commented 1 year ago

Okay! I've made a number of changes. The basic usage should now look like this:

  sidekiq_throttle threshold: {limit: 123, period: 1.hour}, requeue: {to: :other_queue, with: :schedule}

with support for Procs:


class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttle threshold: {limit: 123, period: 1.hour},
                   requeue: {to: ->(user_id, *) { user_id.odd? ? :odd_queue, :even_queue },
                             with: ->(_user_id, more, *) { more > 25 ? :schedule : :enqueue }}

  def perform(user_id, more, stuff)
    # ...
  end
end

and the default configuration looks like this:

Sidekiq::Throttled.configuration.default_requeue_options = {with: :schedule, to: :my_throttled_jobs_queue}

Both :with and :to arguments are optional. If :with is not specified, we'll use the :enqueue method. If :to is not specified, we'll use the queue the job was originally enqueued into.

danpolyuha commented 1 year ago

Hey @ixti , Thanks for your amazing work on this gem. Sorry for asking here, just don't want to create new threads, Do you have any estimate on release of 1.0.0 stable?

woodhull commented 1 year ago

Excited to land this PR both for the app I help maintain with @lavaturtle and another product I'm working on. Anything else we can do to help move this and v1 across the finish line? Happy to put some more of our engineering time in on our side if there are discrete tasks you can point us at. We're keen to invest in the gem & code quality.

ixti commented 1 year ago

@lavaturtle @woodhull thanks for the amazing PR. I will try to go through it today-tomorrow. Sorry for being unresponsive (was really overwhelmed with some work lately - but slowly getting back at dedicating time to sidekiq-throttled).

ixti commented 1 year ago

The only issue I'm scratching my head about right now is the Sidekiq-Pro support. Can't figure out how to make sure it works correctly. Thus thinking on skipping it's support for 1.0.0 release completely and circle back at that task after 1.0.0 release.

mnovelo commented 1 year ago

@ixti I'm happy to help with the Sidekiq Pro support, regardless of whether it's for 1.0.0 or later! Let me know if there's a branch you'd like me to test and/or review

woodhull commented 1 year ago

Hey.... Just looping back to this.

We're not using Sidekiq Pro so we aren't sure how to help move this forward on that front. That code is private for paid Sidekiq customers I think?

mnovelo commented 1 year ago

@woodhull yeah, the Sidekiq Pro code is private for paid customers.

@ixti I'm still happy to help with the Sidekiq Pro support, especially if it's blocking the 1.0.0 release. At the same time, I fully support releasing 1.0.0 without support for Sidekiq Pro's super_fetch. It's a configuration option that we're not using yet. We only use batching from Sidekiq Pro ourselves right now.

JamesWatling commented 1 year ago

Is this going to be released? I love this strategy, we're having a severly throttled queue, and it is degrading our other queues and jobs, would love this strategy!

ixti commented 12 months ago

@JamesWatling yes. Working to merge this in the next couple of days.

ixti commented 12 months ago

This PR will help to mitigate #86 as well

ixti commented 12 months ago

I'm gonna refactor the way scheduling strategy is stored, and will incorporate this PR as part of v2.0.0 release which I will work on next weeks. Couple of heads up on the API I have in mind:

sidekiq_throttle_push_back(
  with: :enqueue, # Possible options: :enqueue, :requeue, and :schedule
  to: nil, # Can specify queue to push back to, nil, String, or proc
  in: 0 # only applicable for :schedule
)

Under the hood, :requeue will simply call UnitOfWork#requeue and :enqueue as well as :schedule will call client_push. This way the implementation will be BasicFetch/SuperFetch-agnostic.

simonc commented 7 months ago

Hi there 👋

That's a pretty nice work going in there! 😊

I was wondering if it'd also make sense to have a :drop strategy when you want to simply prevent the same job from being pushed twice and not queue/schedule a throttled job.

Thanks! ❤️

woodhull commented 6 months ago

Just checking in on this. Has it been abandoned?

ixti commented 6 months ago

Just checking in on this. Has it been abandoned?

Nothing is abandoned. Just lacking of free time to work on this one :(( But it's on my radar to land first thing once I'll get to it.

mnovelo commented 6 months ago

@ixti, let us know if we can help to move this forward!

gstokkink commented 4 months ago

Looking forward to this PR as well, would solve the enqueued_at not being reset issue that me and a lot of other people are running into.

gstokkink commented 4 months ago

Oh and @ixti I'm happy to help with Sidekiq Pro testing if you require it 😄

anero commented 1 week ago

Hi @ixti I work with @lavaturtle (who initially submitted this PR).

We're considering working on this PR to update the code to resolve the conflicts and work with the latest version on main. But given the effort of updating the code due to major changes on main since we first submitted the PR, we wanted to confirm that this feature is still relevant for you and that you would be interested on merging. Otherwise we may look for other options to avoid maintaining a fork of this library with the feature we need.

ixti commented 1 week ago

@anero will be more than happy to merge it. Due to lack of time failed to do any refactoring, so will be happy to merge it either way you will come with it - and (probably) do follow up refactoring after it will be merged LOL

mnovelo commented 6 days ago

@lavaturtle @anero, happy to help test with Sidekiq Pro too

codecov[bot] commented 6 days ago

Codecov Report

Attention: Patch coverage is 96.05263% with 3 lines in your changes missing coverage. Please review.

Project coverage is 98.47%. Comparing base (bca3912) to head (11c6d98).

Files with missing lines Patch % Lines
lib/sidekiq/throttled/strategy.rb 91.89% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #150 +/- ## ========================================== - Coverage 98.97% 98.47% -0.51% ========================================== Files 19 19 Lines 390 458 +68 Branches 56 76 +20 ========================================== + Hits 386 451 +65 - Misses 4 7 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mnovelo commented 6 days ago

the Sidekiq Pro specs pass too 🎉

anero commented 6 days ago

Great, thanks for testing @mnovelo!

I was concerned that there may be a problem with how we requeue the job here https://github.com/ixti/sidekiq-throttled/pull/150/files#diff-c15f45918c6d7d825fbfa0856557360b907b492e44879ba9eb1e7f3bad9c42baR89 which is different from how it was previously done (https://github.com/ixti/sidekiq-throttled/pull/150/files#diff-ae016bb4af61725f74c58dac11f63bd38a08d6c5c5519e467902911a51077b0fL27), but I don't know enough about Sidekiq's API to say if it's an actual problem

mnovelo commented 6 days ago

oh, that is a problem actually @anero I missed that change. Sidekiq Pro's super_fetch does do important additional logic when requeuing. I can work on a PR to address this, but I would like it if we kept Throttled.requeue_throttled(work) or similar so that super_fetch can run its additional logic

anero commented 6 days ago

oh, that is a problem actually @anero I missed that change. Sidekiq Pro's super_fetch does do important additional logic when requeuing. I can work on a PR to address this, but I would like it if we kept Throttled.requeue_throttled(work) or similar so that super_fetch can run its additional logic

@mnovelo I wasn't sure if there's a way to reference the correct retriever (BasicFetch / SuperFetch) from Sidekiq::Throttled::Strategy#requeue_throttled. I guess we could perform a similar verification as it's done elsewhere to decide whether Sidekiq Pro is available or not, and requeue using the correct mechanism. Do you've any thoughts on this @ixti ?

mnovelo commented 6 days ago

@anero oh, the class of the work variable will be Sidekiq::Pro::SuperFetch::UnitOfWork, so that's one way to distinguish whether the super_fetch strategy should be used. That was already being assumed in the previous implementation because it just delegated the requeuing logic to Sidekiq::Pro::SuperFetch::UnitOfWork#requeue so something like

case work
when Sidekiq::Pro::SuperFetch::UnitOfWork
# super_fetch stuff
else
# basic_fetch stuff
end

could work. Though I haven't dug deep into this yet

ixti commented 6 days ago

I think, the most clean solution will be to patch UnitOfWork similar way we patch BasicFetch and SuperFetch. What @mnovelo proposed will work even better, though. :D

mnovelo commented 5 days ago

ok, I'll work on a PR into this one then @anero @ixti !

mnovelo commented 5 days ago

@anero @ixti here's my proposal https://github.com/controlshift/sidekiq-throttled/pull/2

If this looks good, I can write specs for it too

anero commented 3 days ago

Thanks for the PR @mnovelo, sorry I merged before seeing your comment on the specs. Want to add those separately? (I'd do it myself but I don't have Sidekiq Pro to be able to run them)

anero commented 2 days ago

@mnovelo I fixed the failing specs. Would you mind adding test coverage for the new code on Sidekiq::Throttled::Strategy to raise code coverage and get the build passing?

mnovelo commented 1 day ago

@anero Specs are ready for your review! https://github.com/controlshift/sidekiq-throttled/pull/3

mnovelo commented 14 hours ago

This is looking good! However, I want to highlight something important about the new scheduling strategy for future reference. It is not fully compatible with Sidekiq::Batch because when a job gets rescheduled using this strategy, it is assigned a new job ID. If a job within a batch gets throttled with the new scheduling strategy, the original job will be marked as complete. This could lead the batch to assume that the job has been completed as well. I don't think the new job retains the batch ID, and I didn't make any effort to ensure this or write any specs to test this. In my opinion, there isn't a strong use case for using both the new scheduling strategy and Sidekiq::Batch together. However, I'm happy to help revisit that in the future should the need arise.