php / php-src

The PHP Interpreter
https://www.php.net
Other
38.08k stars 7.74k forks source link

PHP-FPM segfault due to after free usage of child->ev_std(out|err) #10461

Closed SandakovMM closed 1 year ago

SandakovMM commented 1 year ago

Description

Hello I have encountered an issue on multiple web-hosting servers after updating to PHP 8.1.14. The PHP-FPM master process is crashing periodically, approximately once a day. This issue does not occur with PHP 8.0. A GDB backtrace shows that the crash is happening in the fpm_event_epoll_wait function:

(gdb) bt
--
#0 0x0000010c00000000 in ?? ()
#1 0x0000556f30ed496e in fpm_event_epoll_wait (queue=<optimized out>, timeout=<optimized out>) at sapi/fpm/fpm/events/epoll.c:141
#2 0x0000556f30ec6201 in fpm_event_loop (err=err@entry=0) at sapi/fpm/fpm/fpm_events.c:427
#3 0x0000556f30ec0167 in fpm_run (max_requests=0x7ffccc6fa19c) at sapi/fpm/fpm/fpm.c:113
#4 0x0000556f30b8430c in main (argc=2, argv=0x7ffccc6fa6d8) at sapi/fpm/fpm/fpm_main.c:1828

Unfortunately, the steps to reproduce the issue are unclear. I can see only warnings about the child limit reached in the error.log. But the warning is common and happens from time to time even when the problem doesn't happen, so I assume they are not connected to the issue. I have reviewed the changes between PHP 8.1.13 and 8.1.14 and suspect that the issue may be caused by this commit. I have reverted this commit and the issue did not occur over the past week. However, I have been unable to determine the specific problem with these changes.

Could you please help investigate the issue?

PHP Version

PHP 8.1.14

Operating System

Ubuntu 20, Ubuntu 22, Centos 7, AlmaLinux 8

bukka commented 1 year ago

@SandakovMM This is interesting as the backtrace points to the actual even firing so it's like it crashes during the callback call which doesn't make much sense to me. It might be potentially some memory corruption. The original commit introduced a bit of casting so maybe that could be the source of the issue. To verify that I prepared a patch in https://github.com/php/php-src/pull/10510 . Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?

If it doesn't help could you share your FPM config and could you also enable a debug log level and share its content around the time of the future crash?

The commit that is causing your commit is actually a fix for https://github.com/php/php-src/issues/8517 so we don't want to just revert it. I'd prefer try to fix it though.

bukka commented 1 year ago

@Poulpatine @triplesixman As you have been testing the commit causing this crash for longer than others, I was wondering if you noticed any further crash and if you are also getting from time to time child limit reached warning (just checking if there can be some relation between the crash and warning potentially)?

SandakovMM commented 1 year ago

Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?

Sure, I will do my best to l check it in two weeks. Thank you

SandakovMM commented 1 year ago

Hey, I am sorry, but unfortunately, we have no resources to check this patch right now. We will do it as soon as possible.

bukka commented 1 year ago

@SandakovMM Hi, just wanted to check if you would be able to share your FPM config and a bit more info about your app and traffic that you are getting? I'm asking because we did not have any other report about this so there might be something specific to your usage.

SandakovMM commented 1 year ago

Hay, I should add some context. We provide our product to hosting companies. Some of them meet the problem and asked for our help. I've checked their FPM configures and could not find anything suspicious that would be complete for all of them. So, unfortunately, I have nothing to share with you. At least for now, sorry.

SandakovMM commented 1 year ago

Good news, we are ready to check if the patch helps. I will keep you updated with the results within the next 2-3 weeks.

github-actions[bot] commented 1 year ago

No feedback was provided. The issue is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so. Thank you.

bukka commented 1 year ago

@SandakovMM thanks for trying to test the patch but unfortunately it is not going to help.

I just figured out what the actual problem is. It is basically due to use of even that is part of the child in https://github.com/php/php-src/blob/c4a1100f190efce41eab728e84788d2491cc4c71/sapi/fpm/fpm/fpm_stdio.c#L332-L336.

The problem is that when child gets freed before the event, the event is also freed so it fails that check.

To fix it I think we need to separate the events from child and keep them under the worker pool potentially so they can outlive the child. I think we should unregister them as well. It is a bit tricky to come with some sensible clean up though. Need to think about it a bit more.

bukka commented 1 year ago

@SandakovMM So I just came up with a slightly different approach which is in https://github.com/php/php-src/pull/11084 . It is basically delaying freeing of the children that were killed / crashed or descaled. It needs more testing so if you are able to test it, that would be appreciated.

I'm still having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the race condition. Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?

SandakovMM commented 1 year ago

Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?

As far as I remember the problem was actual for bunch of various applications. So I believe some of them could write into stdout or stderr much, but not all of them. It might be a good way to repeat the race condition anyway in my point of view.

bukka commented 1 year ago

I committed the mentioned fix as I think it's hopefully safe. It no longer contains the changes that you reverted as it was not right fix. The change will be part of 8.1.20 and 8.2.7. If you see the issue still happening after using those version, please comment here and I will re-open it.

SandakovMM commented 1 year ago

Great! Thank you very much.