Closed SamuelWillis closed 4 months ago
cc @PrinsFrank do you might know what's going on here?
We would need concrete confirmation of what change causes it.
I think I see the issue here:
This is not clearly defined anywhere, but we experienced the same problem running on k8s with many queues. I'll add some documentation for this.
There previously was an issue where if the number of queues multiplied by minProcesses was higher than the maxProcesses key, the autoscaler would scale down to 0 when a queue was silent for a while, never scaling back up. See #1289
Now because you have 8 queues, and the minProcesses is set to 12, it should according to the configuration start 96 processes. But, your maxProcesses is set to 18. Instead, it shares those 18 processes equally across the queues, resulting in 2 processes per queue, where in your previous scenario the queue would scale back to either 1 or 0 depending on the load.
The memory footprint is going to increase per process in proportion to how many non-deferred serviceProviders and singletons you have, so an increase in services will increase the memory consumption.
What will solve your issue here: decrease the 'minProcesses' to a sensible value. If you want to always have 1 process per queue, set minProcesses to 1. Or, if you want to make sure you are scaling more efficiently based on queue load and available resources, group together queues based on job runtime and priority. This is the road we went for, instead of having n queues where most of the queues are always empty and we have to reserve a process for it, make sure that all the high-prio jobs end up in 1 queue with a lot of scalibility, and put other jobs in a low-prio queue.
If you want to scale even further: The json endpoints return all this info in a parsable format. You could run one supervisor per pod and parse the output of the json endpoint, and spin up further pods when the nr of actual processes hits the maximum processes.
Let me know if I can clarify anything else!
Thank you for the great explanation @PrinsFrank.
I had the understanding that the minProcess
and maxProcess
were both in total and defined the lower and upper process count limits for the worker.
We've grouped our queues into priority based queues already which is nice so I think all we will need to do here is adjust our configuration to have the right amount of scaling for each worker.
Thanks all for your help on this one 👍
Horizon Version
>= 5.18
Laravel Version
10.48.16
PHP Version
8.3.7
Redis Driver
Predis
Redis Version
Predis 2.2 & Redis 6.2.14
Database Driver & Version
No response
Description
Upgrading to any Horizon version starting with 5.18 causes a doubling in the number of database connections and memory usage.
Reverting the Horizon back to version 5.17 brings the number of database connections and memory usage back down.
After looking through the Changelog and the associated changes I have been unable to see anything that could cause this besides this autoscaling tweak that could be causing odd scaling behavior.
Here is a snippet from our configuration:
We are running horizon as a k8s pod with multiple replicas, for a bit more context.
Steps To Reproduce
I haven't been able to nail down super concrete replication steps but we have seen this consistently when attempting to upgrade from version 5.17 to a version at or above 5.18.