Open pesc opened 11 months ago
Hi, there are already related PR's and reported bugs for this:
I just went again through this and your proposed solution is #4104 which I'm not sure will make much difference for round robin selection that we have now. Essentially the plan would be to also introduce epoll / kevent based selection that should improve things but it's a bit of work to do.
Hi,
I saw that the idle-timeout never reached (for example at 5s), because the fpm-handler seems let running task "randomly" for one of all available processes. This has following bad side-effects:
What about this idea?
Idea 2 (much faster and maybe easier): If max_requests reached, only kill the PID/process. Don't spawn automatically (directly) a new one, only if needed (what "ondemand" should do) like the regular function via cold-start. So the user has the choice how "aggressive" or fast the handler should handle the running process management (by setting max_requests).
Best regards :)
Hi
Introduction
Recently, I was playing around with PHP-FPM while using the process manager
ondemand
.ondemand
can only be used with kqueue (BSD) or epoll (Linux):https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_conf.c#L922-L923
The following is a snippet of my pool configuration:
This means that each child is being killed if it has been idle for more than 30s. And this works as expected (when running 1 child).
Problem
So this works like it's supposed to. The problem arises if I have a burst in requests, as seen in the screenshot: In the beginning (after the burst), I have 3 running children:
After a minute, I still have 3 children. Even two processes (PID 39039, 39051) did not get any new requests (see counter)
After 4 minutes, I still have 2 children. Even PID 39051 did not get any new requests (see counter).
Reason
I dug into the php-fpm code and was able to find the problematic code snippet. On these lines, php-fpm tries to find the
last_idle_child
. For that, it iterates over his active children and if it isidle
it tries to find the "oldest" idle child based on the->started
time. In my case, PID 38733 is idle and the oldest child and is therefore being selected aslast_idle_child
, even though it gets all the current requests (see counter). https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_process_ctl.c#L361-L374And there is the problem. Because of how epoll/kqueue works (Cloudflare Blog: Why does one NGINX worker take all the load?), it is possible that one child gets all the load/requests. The selected
last_idle_child
(PID 38733) is then checked to see if it exceeds thepm.process_idle_timeout
. This is not the case for PID 38733, as it is the child which handles all the requests.And that is the reason PID 39051 is not getting killed even it did not serve any request in the last 3 minutes.
https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_process_ctl.c#L388-L390
Expected behaviour
I would expect, that every child exceeding the
pm.process_idle_timeout
should be killed by the fpm-master. Regardless of whether it has been alive for a long time or not.Possible Solution
It is not that easy to find a solution for this problem. I came up with this idea: Check if
pm.process_idle_timeout
is reached for each child on every run (for ondemand) instead of picking thelast_idle_child
. This may be a bit more CPU intensive. Any other ideas?PHP Version
PHP 8.2.13
Operating System
FreeBSD 12.4