php / php-src

The PHP Interpreter
https://www.php.net
Other
37.78k stars 7.72k forks source link

Random Blank white page frequently while using php8.2-fpm #14605

Open rahulthackkar opened 2 months ago

rahulthackkar commented 2 months ago

Description

My ubuntu 22.04 server has installed php8.2-fpm (8.2.19) and apache 2.4.52. Server configuration: 4 core CPU, 16GB RAM, Swap : 0 I have 72 websites with each have own pool with below same config.

group = site1
listen = /run/php/site1.sock
listen.owner = site1
listen.group = site1
listen.mode = 0666

pm = ondemand
pm.max_children = 10
pm.start_servers = 0
pm.min_spare_servers = 1
pm.max_spare_servers = 5
pm.process_idle_timeout = 10;
pm.max_requests = 50

security.limit_extensions = .php
php_admin_flag[allow_url_fopen] = on
php_admin_flag[log_errors] = on
php_admin_value[short_open_tag] = on
php_value[error_log] = /home/site1/logs/fpm.error.log
php_value[upload_max_filesize]=50M
php_value[post_max_size]=60M
php_value[max_input_vars]=20000
php_value[max_execution_time]=4500
php_value[session.cookie_lifetime]=0
php_value[memory_limit]=1024M
php_value[session.gc_maxlifetime]=3600
php_value[error_reporting]=85

request_slowlog_timeout = 60
slowlog = /home/$pool/logs/fpm.error.log.slow

With php modules enabled as below

[PHP Modules]
bcmath
bz2
calendar
Core
ctype
curl
date
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
igbinary
imagick
imap
intl
ionCube Loader
json
ldap
libxml
mbstring
mcrypt
mysqli
mysqlnd
openssl
pcntl
pcre
PDFlib
PDO
pdo_mysql
Phar
posix
random
readline
redis
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
ssh2
standard
sysvmsg
sysvsem
sysvshm
tidy
tokenizer
vips
xml
xmlreader
xmlrpc
xmlwriter
xsl
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache
the ionCube PHP Loader

We are facing blank white page randomly with php8.2-fpm, sites are down , when I check service php8.2-fpm status, it was active. Memory consumption at that time was also not much, 80% RAM was available almost. I also tried to coredump , but no crash at all, what I have in logs are as below, error code 70.

/var/log/php8.2-fpm.log have below logs when white page

[19-Jun-2024 10:20:43] WARNING: [pool site1] child 1895425 exited with code 70 after 1.518009 seconds from start
[19-Jun-2024 10:20:43] WARNING: [pool site2] child 1895426 exited with code 70 after 1.567505 seconds from start
[19-Jun-2024 10:20:44] WARNING: [pool site3] child 1895440 exited with code 70 after 0.020172 seconds from start
[19-Jun-2024 10:20:44] WARNING: [pool site4] child 1895419 exited with code 70 after 3.510533 seconds from start

Not able to find out the root cause of this, can you help me with this please ? Also please let me know what more information required to find out this issue.

PHP Version

PHP 8.2.19

Operating System

Ubuntu 22.04

bukka commented 1 month ago

@rahulthackkar

pm.max_requests = 50

This is way too low. You should use this only if you experience some memory leaks so either remove it or significantly increase it.

rahulthackkar commented 1 month ago

@rahulthackkar

pm.max_requests = 50

This is way too low. You should use this only if you experience some memory leaks so either remove it or significantly increase it.

Let me try by removing it then observe. Just wanted to know if we don't specify this, how many requests will it consider?

rahulthackkar commented 1 month ago

Hello @bukka

For your more information, I enabled opcache error log as suspicious cause of this error and I found below logs at exactly same time when child processes are being exited unexpectedly.

/var/log/opcache-error.log

Sat Jun 29 18:49:33 2024 (1453415): Error Cannot kill process 1451702!
Sat Jun 29 18:49:33 2024 (1453463): Error Cannot kill process 1451702!
Sat Jun 29 18:49:34 2024 (1453445): Error Cannot kill process 1451702!
Sat Jun 29 18:49:34 2024 (1453495): Error Cannot kill process 1451702!
Sat Jun 29 18:49:34 2024 (1453499): Error Cannot kill process 1451702!
Sat Jun 29 18:49:35 2024 (1453492): Error Cannot kill process 1451702!

Where left side process id is same which child process is exited with error 70.

I want to know root cause of this error.

rahulthackkar commented 1 month ago

I am able to reproduce the same / similar case

Different pool for different user (Multi user php fpm configured) opcache configured with

opcache.force_restart_timeout=10

site1_document_root/longrunningscript.php

<?php sleep(1000);

site2_document_root/clearcache.php

<?php opcache_reset();

Run longrunningscript.php opcache debug log

Tue Jul  2 16:04:46 2024 (3137485): Message Cached script '/home/site1/wait.php'

Run clearcache.php opcache debug log

Tue Jul  2 16:04:51 2024 (3137480): Message Cached script '/home/site2/kill.php'
Tue Jul  2 16:04:51 2024 (3137480): Debug Restart Scheduled! Reason: user

After 10 seconds run any script of site2, opcache debug log

Tue Jul  2 16:05:20 2024 (3137535): Warning Forced restart at 1719936320 (after 10 seconds), locked by 3137485
Tue Jul  2 16:05:20 2024 (3137535): Warning Attempting to kill locker 3137485
Tue Jul  2 16:05:20 2024 (3137535): Warning Failed to send SIGKILL to locker 3137485: Operation not permitted
Tue Jul  2 16:05:20 2024 (3137535): Error Cannot kill process 3137485!

now all websites having different pool with php fpm stops with error 70.

It seems opcache clear and restart cache is not able to do necessary things, it is going to kill_all_lockers But processes running with other users are not being killed, and happens this crash.

rahulthackkar commented 1 month ago

@rahulthackkar

pm.max_requests = 50

This is way too low. You should use this only if you experience some memory leaks so either remove it or significantly increase it.

Removing pm.max_requests = 50 didn't help.

bukka commented 1 month ago

Ok I think I know the cause of this. The problem is that opcache is using single shared memory. This is pretty much duplicate of https://github.com/php/php-src/issues/8072 (there is more info about this issue) and https://bugs.php.net/bug.php?id=74709 (also contains some additional info). Unfortunately the solution for this is quite complex and it will take time - see https://github.com/php/php-src/issues/11723

Currently the only way how to prevent those issues is to use the same user / group for all pools.

rahulthackkar commented 1 month ago
opcache_reset();

Hello @bukka , Thanks for confirming this behaviour, I was assuming that same so.

But, opcache_reset(); is called only in above example, in actual code or production, we never call opcache_reset(); from php code. Just wanted to know how and when opcache runs this opcache_reset().

So by knowing the cause of opcache reset, if we can do some more fine tuning of pools that can help us to less frequent restart of opcache.