Closed rcanavan closed 1 year ago
@rcanavan Would it be possible to share the test? This is going to be hard to understand otherwise.
I'm afraid I can't share our PHP source at this time (and it would require our php extension as well). Since the issue appeared within php-fpm itself, my naïve assumption was that the specifics there wouldn't matter. The test setup was just php-fpm with
pm = dynamic
pm.max_children = 100
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 7
and running a script that basically contained curl http://localhost:8080/TESTURL1 & curl http://localhost:8080/TESTURL2 & ...
manually with about 40 curl processes running in parallel. With only 4 CPU cores available for the tests, that resulted in fpm-children getting spawned and killed for every test run.
Thanks for the pointer as it helped me figure out the actual problem reported in https://github.com/php/php-src/issues/10461 which is basically duplicate of this. See this comment https://github.com/php/php-src/issues/10461#issuecomment-1502108569 . I will close this as it is basically a duplicate of that ticket so will just keep open that one.
@rcanavan So I just came up with potential fix in https://github.com/php/php-src/pull/11084 . It is basically delaying freeing of the children that were killed / crashed or descaled. It needs more testing so if you are able to test it, that would be great?
I'm having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the problem. When you said that the children are getting killed, is it because the extension is crashing? Is there also lots of stderr output produced? Would you be also able to enable debug log level in FPM and share what is produced around the time when the problem is reported. Any info that will help me to recreate would be appreciated!
I will actually re-open this until the problem is fixed.
I'm having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the problem.
I've actually tried to re-create this issue last month, with a few thousand test runs, with no further crashes in php-fpm itself observed.
When you said that the children are getting killed, is it because the extension is crashing?
No - just regular churn of children due to them reaching pm.max_requests.
Is there also lots of stderr output produced?
2-3 lines of output for each request processed.
Would you be also able to enable debug log level in FPM and share what is produced around the time when the problem is reported. Any info that will help me to recreate would be appreciated!
I can try running the tests for a while during the coming week, but I can't promise anything. Do you have specific requests regarding debug log level?
The debug log would be useful around the time the issue was happening. But don't worry if you can't recreate it. I think it would just show that it happened when child was killed which we already know.
I committed the mentioned fix as I think it's hopefully safe. The change will be part of 8.1.20 and 8.2.7. If you see the issue still happening after using those versions, please comment here.
Description
While attempting to reproduce an issue in our own PHP extension, I've encountered some complaints in php-fpm by Valgrind in fpm_event_fire(). The test involved ~40 requests started at the same time with curl. I cannot reproduce this with any regularity.
Sample trace:
fpm-valgrind.txt
PHP Version
PHP 8.2.3
Operating System
Ubuntu 22.04 (in a docker container, running on Ubuntu 22.04)