Closed SandakovMM closed 1 year ago
@SandakovMM This is interesting as the backtrace points to the actual even firing so it's like it crashes during the callback call which doesn't make much sense to me. It might be potentially some memory corruption. The original commit introduced a bit of casting so maybe that could be the source of the issue. To verify that I prepared a patch in https://github.com/php/php-src/pull/10510 . Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?
If it doesn't help could you share your FPM config and could you also enable a debug log level and share its content around the time of the future crash?
The commit that is causing your commit is actually a fix for https://github.com/php/php-src/issues/8517 so we don't want to just revert it. I'd prefer try to fix it though.
@Poulpatine @triplesixman As you have been testing the commit causing this crash for longer than others, I was wondering if you noticed any further crash and if you are also getting from time to time child limit reached
warning (just checking if there can be some relation between the crash and warning potentially)?
Would you be able to cherry pick its only commit, apply it to 8.1.14 and check if you see any crashes after that?
Sure, I will do my best to l check it in two weeks. Thank you
Hey, I am sorry, but unfortunately, we have no resources to check this patch right now. We will do it as soon as possible.
@SandakovMM Hi, just wanted to check if you would be able to share your FPM config and a bit more info about your app and traffic that you are getting? I'm asking because we did not have any other report about this so there might be something specific to your usage.
Hay, I should add some context. We provide our product to hosting companies. Some of them meet the problem and asked for our help. I've checked their FPM configures and could not find anything suspicious that would be complete for all of them. So, unfortunately, I have nothing to share with you. At least for now, sorry.
Good news, we are ready to check if the patch helps. I will keep you updated with the results within the next 2-3 weeks.
No feedback was provided. The issue is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so. Thank you.
@SandakovMM thanks for trying to test the patch but unfortunately it is not going to help.
I just figured out what the actual problem is. It is basically due to use of even that is part of the child in https://github.com/php/php-src/blob/c4a1100f190efce41eab728e84788d2491cc4c71/sapi/fpm/fpm/fpm_stdio.c#L332-L336.
The problem is that when child gets freed before the event, the event is also freed so it fails that check.
To fix it I think we need to separate the events from child and keep them under the worker pool potentially so they can outlive the child. I think we should unregister them as well. It is a bit tricky to come with some sensible clean up though. Need to think about it a bit more.
@SandakovMM So I just came up with a slightly different approach which is in https://github.com/php/php-src/pull/11084 . It is basically delaying freeing of the children that were killed / crashed or descaled. It needs more testing so if you are able to test it, that would be appreciated.
I'm still having a bit of issue to recreate this problem locally. I tried various cases but so far I haven't managed to recreate the race condition. Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?
Do you actually know if you see children crashing a lot in the logs and if they produce lots of output to stderr?
As far as I remember the problem was actual for bunch of various applications. So I believe some of them could write into stdout or stderr much, but not all of them. It might be a good way to repeat the race condition anyway in my point of view.
I committed the mentioned fix as I think it's hopefully safe. It no longer contains the changes that you reverted as it was not right fix. The change will be part of 8.1.20 and 8.2.7. If you see the issue still happening after using those version, please comment here and I will re-open it.
Great! Thank you very much.
Description
Hello I have encountered an issue on multiple web-hosting servers after updating to PHP 8.1.14. The PHP-FPM master process is crashing periodically, approximately once a day. This issue does not occur with PHP 8.0. A GDB backtrace shows that the crash is happening in the fpm_event_epoll_wait function:
Unfortunately, the steps to reproduce the issue are unclear. I can see only warnings about the child limit reached in the error.log. But the warning is common and happens from time to time even when the problem doesn't happen, so I assume they are not connected to the issue. I have reviewed the changes between PHP 8.1.13 and 8.1.14 and suspect that the issue may be caused by this commit. I have reverted this commit and the issue did not occur over the past week. However, I have been unable to determine the specific problem with these changes.
Could you please help investigate the issue?
PHP Version
PHP 8.1.14
Operating System
Ubuntu 20, Ubuntu 22, Centos 7, AlmaLinux 8