php / php-src

The PHP Interpreter
https://www.php.net
Other
38.08k stars 7.74k forks source link

Valgrind complaints about multiple Invalid read around fpm_event_fire() #10889

Closed rcanavan closed 1 year ago

rcanavan commented 1 year ago

Description

While attempting to reproduce an issue in our own PHP extension, I've encountered some complaints in php-fpm by Valgrind in fpm_event_fire(). The test involved ~40 requests started at the same time with curl. I cannot reproduce this with any regularity.

Sample trace:

Invalid read of size 8
   at 0x74AA39: fpm_event_fire (fpm_events.c:483)
   by 0x75A6C4: fpm_event_epoll_wait (epoll.c:141)
   by 0x74A759: fpm_event_loop (fpm_events.c:427)
   by 0x743D9F: fpm_run (fpm.c:113)
   by 0x34BF4A: main (fpm_main.c:1813)
 Address 0x16c1b2c8 is 152 bytes inside a block of size 248 free'd
   at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x7446D0: fpm_child_free (fpm_children.c:62)
   by 0x7446D0: fpm_child_close (fpm_children.c:86)
   by 0x7446D0: fpm_children_bury (fpm_children.c:263)
   by 0x74A7F2: fpm_event_fire (fpm_events.c:487)
   by 0x74A7F2: fpm_event_loop (fpm_events.c:467)
   by 0x743D9F: fpm_run (fpm.c:113)
   by 0x34BF4A: main (fpm_main.c:1813)
 Block was alloc'd at
   at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x744298: fpm_child_alloc (fpm_children.c:45)
   by 0x744298: fpm_resources_prepare (fpm_children.c:310)
   by 0x744298: fpm_children_make (fpm_children.c:399)
   by 0x744710: fpm_children_bury (fpm_children.c:294)
   by 0x74A7F2: fpm_event_fire (fpm_events.c:487)
   by 0x74A7F2: fpm_event_loop (fpm_events.c:467)
   by 0x743D9F: fpm_run (fpm.c:113)
   by 0x34BF4A: main (fpm_main.c:1813)

fpm-valgrind.txt

PHP Version

PHP 8.2.3

Operating System

Ubuntu 22.04 (in a docker container, running on Ubuntu 22.04)

iluuu1994 commented 1 year ago

@rcanavan Would it be possible to share the test? This is going to be hard to understand otherwise.

rcanavan commented 1 year ago

I'm afraid I can't share our PHP source at this time (and it would require our php extension as well). Since the issue appeared within php-fpm itself, my naïve assumption was that the specifics there wouldn't matter. The test setup was just php-fpm with

pm = dynamic
pm.max_children = 100
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 7

and running a script that basically contained curl http://localhost:8080/TESTURL1 & curl http://localhost:8080/TESTURL2 & ... manually with about 40 curl processes running in parallel. With only 4 CPU cores available for the tests, that resulted in fpm-children getting spawned and killed for every test run.

bukka commented 1 year ago

Thanks for the pointer as it helped me figure out the actual problem reported in https://github.com/php/php-src/issues/10461 which is basically duplicate of this. See this comment https://github.com/php/php-src/issues/10461#issuecomment-1502108569 . I will close this as it is basically a duplicate of that ticket so will just keep open that one.

bukka commented 1 year ago

@rcanavan So I just came up with potential fix in https://github.com/php/php-src/pull/11084 . It is basically delaying freeing of the children that were killed / crashed or descaled. It needs more testing so if you are able to test it, that would be great?

I'm having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the problem. When you said that the children are getting killed, is it because the extension is crashing? Is there also lots of stderr output produced? Would you be also able to enable debug log level in FPM and share what is produced around the time when the problem is reported. Any info that will help me to recreate would be appreciated!

I will actually re-open this until the problem is fixed.

rcanavan commented 1 year ago

I'm having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the problem.

I've actually tried to re-create this issue last month, with a few thousand test runs, with no further crashes in php-fpm itself observed.

When you said that the children are getting killed, is it because the extension is crashing?

No - just regular churn of children due to them reaching pm.max_requests.

Is there also lots of stderr output produced?

2-3 lines of output for each request processed.

Would you be also able to enable debug log level in FPM and share what is produced around the time when the problem is reported. Any info that will help me to recreate would be appreciated!

I can try running the tests for a while during the coming week, but I can't promise anything. Do you have specific requests regarding debug log level?

bukka commented 1 year ago

The debug log would be useful around the time the issue was happening. But don't worry if you can't recreate it. I think it would just show that it happened when child was killed which we already know.

I committed the mentioned fix as I think it's hopefully safe. The change will be part of 8.1.20 and 8.2.7. If you see the issue still happening after using those versions, please comment here.