rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

Fixes #262: Automatic worker process recycling #352

Closed hms closed 2 months ago

hms commented 2 months ago

This PR adds two new configuration parameters:

There are no specific unit requirements placed on either of these new parameters. What's important is: They use the same order of magnitude and they are comparable.

For example, if the calc_memory_usage proc returns 300Mb as 300 (as in Megabytes) then the recycle_on_oom set on the work should be 300 too.

Any worker without recycle_on_oom is not impacted in anyway. If the calc_memory_usage is nil (default), then this oom checking it off for workers under the control of this Supervisor.

The check for OOM is made after the Job has run to completion and before the SolidQueue worker does any additional processing.

The single biggest change to SolidQueue, that probably requires the most review is moving the job.unblock_next_blocked_job out of ClaimedExecution and up one level into Pool. The rational for this change is that the ensure block on the Job execution is not guarrenteed to run if the system / thread is forcibly shutdown while the job is inflight. However, the Thread.ensure does seem to get called reliably on forced shutdowns.

Give my almost assuredly incomplete understanding of the concurrency implementation despite Rosa working very hard to help me to grok it, there is some risk here that this change is wrong.

My logic for this change is as follows:

Small fix

hms commented 2 months ago

@rosa The tests that are failing here ran successfully multiple times on my system. I think they are sound, but it's a matter of the performance of the underlying testing resources and the configuration settings that impact timing assumptions. But if you / the team feel I've gone a wrong direction and need to dig deeper, just point me in a suggested direction.