mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
387 stars 195 forks source link

Use a queue for offline jobs #1240

Closed crowbot closed 1 year ago

crowbot commented 10 years ago

Possible job types to include:

Possible advantages:

crowbot commented 10 years ago

Not spawning a new rails app each time a mail comes in could improve the resource usage on smaller hosts.

garethrees commented 8 years ago

Rails 4.2 (https://github.com/mysociety/alaveteli/issues/2968) has a built-in framework for integrating background jobs.

garethrees commented 3 years ago

Rails 6.1 (#6011) has the ability to destroy associated records asynchronously which would help prevent timeouts when managing WDTK.

gbp commented 2 years ago

To help 3rd party installs, @garethrees wanted to look at using a PG backed queue over Sidekiq/Redis - this seems like a good option: https://github.com/bensheldon/good_job

garethrees commented 2 years ago

https://github.com/que-rb/que was also something I'd seen that seems to be pretty well maintained.

garethrees commented 1 year ago

Our initial stab at this should be quite basic; mainly with the intention of getting the job infrastructure in place and running, with an initial concrete implementation of ActiveJob so that we actually gain some benefit from doing the setup work.

I think we have 4 main tasks.

  1. Decide on a QueueAdapter
  2. Add ActiveJob configuration (ensure we handle errors, etc)
  3. Add infrastructure to run the backend processor (unix daemon?)
  4. Add a concrete implementation of ActiveJob

2 and 3 don't need too much discussion at this point, so I'll focus on 1 and 4.

To reiterate, the main focus is getting the ActiveJob infrastructure in place and running so that over time we can more easily add jobs in future; we're not going to get everything migrated to a job in one shot.

1. Decide on a QueueAdapter

As previously mentioned, I'm keen to avoid additional infrastructure costs for re-users who don't have in-house sysadmins. Sidekiq requires Redis, which adds another thing to configure, secure, run and maintain.

It's also worth noting that Sidekiq does not process the queue serially. Is this a problem? IDK. Here's a situation:

Would it be bad if the order of indexing ended up being?

Do either of the postgres-based backends (que or good_job) allow more flexibility here?

4. Add a concrete implementation of ActiveJob

Association Destruction

An easy-to-implement option would be to add dependent: :destroy_async for some key classes so that we get jobs flowing through the system. Are there records we destroy on a fairly regular basis that would benefit from this?

Xapian Indexing

The ideal improvement would be to run ActsAsXapian indexing jobs via ActiveJob, but this has potential to be a significant effort which may exceed the time we have available. If we explore this option, we should create an outline plan of how we'll achieve it before starting programming work. Of course this will require a little spiking, but we want to have an understanding of what we think this will take before committing to completing it.

We already have xapian_create_job that we can re-write to create an ActiveJob record.

I think the difficulty will be in how the job actually gets reindexed. All this logic is currently wrapped up in ActsAsXapian.update_index which expects to iterate through the current ActsAsXapianJob records, rather than having a single event to index. We do have the xapian_index which might be the thing to call from the job, but we need to be careful of the interaction with full indexing (i.e. "aborting incremental index update while full index rebuild happens").

Expiring Events

This is all sounding quite complicated, so perhaps instead we move InfoRequest#reindex_request_events to be background-job based as that's something that often times out in a request cycle… something like:

class InfoRequest
  def reindex_request_events
    InfoRequest::ReindexRequestEventsJob.perform_later(self)
  end
end

class InfoRequest::ReindexRequestEventsJob
  def perform(info_request)
    info_request.info_request_events.find_each(&:xapian_mark_needs_index)
  end
end
garethrees commented 1 year ago

Fixed by https://github.com/mysociety/alaveteli/pull/7575. Moving specific functionality into background jobs can be ticketed independently.