php / php-src

The PHP Interpreter
https://www.php.net
Other
37.91k stars 7.72k forks source link

mod_php 8.1.26 crashes on apache graceful restart #12844

Open mariancerny opened 9 months ago

mariancerny commented 9 months ago

Description

After updating from PHP 8.1.25 to 8.1.26 our apache24 servers started to crash from time-to-time on graceful restarts (e.g. after midnight log rotation).

Here is the core dump:

(lldb) target create "httpd" --core "httpd-86152.core"
Core file '/tmp/httpd-86152.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'httpd', stop reason = signal SIGSEGV
  * frame #0: 0x0000000801673170 libphp.so`zend_optimizer_compact_literals(op_array=0x0000000000000000, ctx=<unavailable>) at compact_literals.c:520:16
    frame #1: 0x00000008015ce1c6 libphp.so`zend_extension_startup(extension=0x0000000801150510) at zend_extensions.c:196:7
    frame #2: 0x00000008015b0ba1 libphp.so`zend_llist_apply_with_del(l=0x0000000801754100, func=(libphp.so`zend_extension_startup at zend_extensions.c:193)) at zend_llist.c:171:7
    frame #3: 0x00000008015ce1a8 libphp.so`zend_startup_extensions at zend_extensions.c:217:2 [artificial]
    frame #4: 0x000000080154f8c7 libphp.so`php_module_startup(sf=<unavailable>, additional_modules=<unavailable>, num_additional_modules=1) at main.c:2263:2
    frame #5: 0x00000008016ad385 libphp.so`php_apache2_startup(sapi_module=<unavailable>) at sapi_apache2.c:381:6
    frame #6: 0x00000008016acbd6 libphp.so`php_apache_server_startup(pconf=0x0000000800907028, plog=<unavailable>, ptemp=<unavailable>, s=<unavailable>) at sapi_apache2.c:495:6
    frame #7: 0x0000000000263d6f httpd`ap_run_post_config + 79
    frame #8: 0x000000000025f4cc httpd`main + 2476
    frame #9: 0x000000000025e910 httpd`_start + 256

PHP Version

PHP 8.1.26

Operating System

FreeBSD 12.4

nielsdos commented 9 months ago

So this never happened prior to version 8.1.26?

A couple of questions:

In any case it's very strange that this line:

https://github.com/php/php-src/blob/3f57bd80f68fdff067b45f3d717dcf340b972635/Zend/zend_extensions.c#L196

would end up in zend_optimizer_compact_literals, unless part of the backtrace is missing...

mariancerny commented 9 months ago

Yes, that right, It never happened before version 8.1.26.

Sorry about the backtrace, it was wrong. It crashed at our portbuild server where I forgot to restart apache. So the crash is from old version of apache however the backtrace is done against new version of binaries.

Here is the correct backtrace:

(lldb) target create "httpd" --core "httpd-9988.core"
Core file '/tmp/httpd-9988.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'httpd', stop reason = signal SIGSEGV
  * frame #0: 0x000000080141a5d4 libzstd.so.1`___lldb_unnamed_symbol756 + 740
    frame #1: 0x00000008027ce1c6 libphp.so`zend_extension_startup(extension=0x0000000801b18510) at zend_extensions.c:196:7
    frame #2: 0x00000008027b0ba1 libphp.so`zend_llist_apply_with_del(l=0x0000000802954100, func=(libphp.so`zend_extension_startup at zend_extensions.c:193)) at zend_llist.c:171:7
    frame #3: 0x00000008027ce1a8 libphp.so`zend_startup_extensions at zend_extensions.c:217:2 [artificial]
    frame #4: 0x000000080274f8c7 libphp.so`php_module_startup(sf=<unavailable>, additional_modules=<unavailable>, num_additional_modules=1) at main.c:2263:2
    frame #5: 0x00000008028ad385 libphp.so`php_apache2_startup(sapi_module=<unavailable>) at sapi_apache2.c:381:6
    frame #6: 0x00000008028acbd6 libphp.so`php_apache_server_startup(pconf=0x0000000800907028, plog=<unavailable>, ptemp=<unavailable>, s=<unavailable>) at sapi_apache2.c:495:6
    frame #7: 0x0000000000263d6f httpd`ap_run_post_config + 79
    frame #8: 0x000000000025f4cc httpd`main + 2476
    frame #9: 0x000000000025e910 httpd`_start + 256

It reliably crashes after apachectl graceful on the portbuild server. It also crashes when rotating logs using newsyslog which is sending signal 30 (SIGUSR1) to apache httpd which I believe is also doing graceful restart.

Here is the list of extensions:

# php -m
[PHP Modules]
bcmath
bz2
calendar
Core
ctype
curl
date
dom
exif
fileinfo
filter
ftp
gd
gettext
hash
iconv
imagick
intl
json
ldap
libxml
mbstring
mysqli
mysqlnd
openssl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
Reflection
session
SimpleXML
soap
sockets
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache

I wanted find out which extension is causing the crash by disabling some of the extensions. I don't know which extension is calling directly libzstd.so. I don't build all the packages with debug symbols (only apache and php). I tried disabling fileinfo (the only one in /usr/local/lib/php/20210902 that is positive for grep zstd), didn't help. I tried disabling zlib, didn't help. I tried disabling all, no crash. I tried disable all except ext-10-opcache.ini and ext-18-session.ini. Now the crash was different:

(lldb) target create "httpd" --core "httpd-66641.core"
Core file '/tmp/httpd-66641.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'httpd', stop reason = signal SIGSEGV
  * frame #0: 0x000000080141a5d0
    frame #1: 0x0000000801ecdc62 opcache.so`___lldb_unnamed_symbol1013 + 66
    frame #2: 0x00000008024df1c6 libphp.so`zend_extension_startup(extension=0x0000000801b18010) at zend_extensions.c:196:7
    frame #3: 0x00000008024c1ba1 libphp.so`zend_llist_apply_with_del(l=0x0000000802665100, func=(libphp.so`zend_extension_startup at zend_extensions.c:193)) at zend_llist.c:171:7
    frame #4: 0x00000008024df1a8 libphp.so`zend_startup_extensions at zend_extensions.c:217:2 [artificial]
    frame #5: 0x00000008024608c7 libphp.so`php_module_startup(sf=<unavailable>, additional_modules=<unavailable>, num_additional_modules=1) at main.c:2263:2
    frame #6: 0x00000008025be385 libphp.so`php_apache2_startup(sapi_module=<unavailable>) at sapi_apache2.c:381:6
    frame #7: 0x00000008025bdbd6 libphp.so`php_apache_server_startup(pconf=0x0000000800907028, plog=<unavailable>, ptemp=<unavailable>, s=<unavailable>) at sapi_apache2.c:495:6
    frame #8: 0x0000000000263d6f httpd`ap_run_post_config + 79
    frame #9: 0x000000000025f4cc httpd`main + 2476
    frame #10: 0x000000000025e910 httpd`_start + 256

I tried to disable ext-10-opcache.ini and enable all the others, it doesn't crash.

Unanswered questions:

I will investigate further. E.g. I will try to get crash dump on the production servers.

mariancerny commented 9 months ago

The crash on production servers is different (which I thought it might be, because apachectl graceful does not reproduce the crash reliably -- it crashes only sometimes):

(lldb) target create "httpd" --core "httpd-99468.core"
Core file '/tmp/httpd-99468.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'httpd', stop reason = signal SIGSEGV
  * frame #0: 0x00000008020f5124 libphp.so`zend_get_callable_name_ex + 164
    frame #1: 0x00000008020f9f06 libphp.so`___lldb_unnamed_symbol7929 + 22
    frame #2: 0x00000008020dc5d1 libphp.so`zend_llist_apply_with_del + 81
    frame #3: 0x000000080207b107 libphp.so`php_module_startup + 2359
    frame #4: 0x00000008021da2c5 libphp.so`___lldb_unnamed_symbol9598 + 21
    frame #5: 0x00000008021d9aa6 libphp.so`___lldb_unnamed_symbol9593 + 118
    frame #6: 0x00000000002627cf httpd`ap_run_post_config + 79
    frame #7: 0x000000000025de8c httpd`main + 2476
    frame #8: 0x000000000025d2d0 httpd`_start + 256
nielsdos commented 9 months ago

It seems to call to bogus pointers. If I had to take a guess (but I haven't confirmed this) I think the only commit between 8.1.25 and 8.1.26 that could cause this is a8c6c6165b9c470577cab3db2a9ae3625cf17a84. This touches module startup code which is what I see in your backtraces.

If possible, could you create a custom PHP build with that commit reverted and check if there are issues?

I'll try to find some time soon to try and reproduce this myself.

mariancerny commented 9 months ago

TLDR: It's not related to changes in php 8.1.26. Seems to be related to changes in FreeBSD package for 8.1.26.

I have tried to (manually) revert commit a8c6c61, but it (apachectl graceful) crashed anyway. I have tried to (manually) revert all the commits from 8.1.26 (to confirm I am building it correctly), but it crashed anyway.

There have been some other changes in the FreeBSD port in 8.1.26 (https://www.freshports.org/commit.php?category=lang&port=php81&files=yes&message_id=06be0d5391f475771145a7a9d68d34b0ad5a9b85):

I have tried to change the Makefile for the FreeBSD port to build 8.1.25 instead of 8.1.26 and it crashed! (So the crash is not related to changes in php 8.1.26.) I have tried to revert changes related to "Switch to system pcre", and it didn't crash! I removed those (FreeBSD) changes:

BUILD_DEPENDS=  re2c>0:devel/re2c
CONFIGURE_ARGS+= --with-external-pcre=${LOCALBASE}

(and also used pkg-plist from the FreeBSD port for version 8.1.25, which re-adds include/php/ext/pcre/pcre2lib/ header files, otherwise some php extensions failed to build.)

So reverting --with-external-pcre fixes the crash on our portbuild server. I will try to apply this "fix" on our production servers and see if it fixes the problem there as well. I will then report this problem to FreeBSD bug tracking system.

However, I find it strange that changing the pcre library is causing this crash, because the backtrace from the core dump does not seem to be related to pcre.

nielsdos commented 9 months ago

Thanks for the investigation! That's bizarre and seems like the problem is with whatever FreeBSD is doing.

If I had to take a guess, one possibility is that a conflicting version of pcre2 is used. e.g. when one (transitive) library dependency of an extension depends on pcre2 and php also uses pcre2 for ext/pcre, this can cause problems when different versions are used because of conflicting symbols. Using the bundled pcre2 would fix this because we prefix some symbol names with the name php to avoid that exact issue.

mariancerny commented 9 months ago

The fix to FreeBSD port reverting "Switch to system pcre" seems to have fixed also the random (sporadic) crash on reload on our production servers as well. No crash so far.

I have reported the bug to FreeBSD project:

Bug 275522 - www/mod_php81: Apache crashes on reload after switch to system pcre https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275522

5u623l20 commented 9 months ago

So although mentioned in the documentation that php can use either the bundled or the external library; this can cause regression?

I am also curious to know the reason for having the configure knob --with-external-pcre

nielsdos commented 9 months ago

So although mentioned in the documentation that php can use either the bundled or the external library; this can cause regression?

To clarify: this is only a problem if mixed versions are used of the same library because the symbols create a conflict. This doesn't happen on stock PHP. This could happen for example when there are other modules in Apache or 3rd party extensions in PHP that load a different pcre version (e.g. bundled library) where the names clash.

Also I'm not sure this is actually the problem the user is experiencing, this is just based on prior experience and the traces I saw.

I will look into it. One major reason I actually moved into system pcre is that the pcre library is not at all updated in the lifetime of a major php versions and except php8.3 all are vulnerable versions which is a security risk from my perspective.

Note that we do backport critical and security fixes to the bundled pcre library. So although the version is seemingly not updated we do backport fixes.

I am also curious to know the reason for having the configure knob --with-external-pcre

I don't know the full history of this, but AFAIK this is the preferred way.

I'm also reading something in the FreeBSD issue report comments about other Apache issues. Especially this is an interesting remark: "From my experience, Apache2 recently crashes almost on any module including its own.".

5u623l20 commented 9 months ago

So although mentioned in the documentation that php can use either the bundled or the external library; this can cause regression?

To clarify: this is only a problem if mixed versions are used of the same library because the symbols create a conflict. This doesn't happen on stock PHP. This could happen for example when there are other modules in Apache or 3rd party extensions in PHP that load a different pcre version (e.g. bundled library) where the names clash.

Also I'm not sure this is actually the problem the user is experiencing, this is just based on prior experience and the traces I saw.

I will look into it. One major reason I actually moved into system pcre is that the pcre library is not at all updated in the lifetime of a major php versions and except php8.3 all are vulnerable versions which is a security risk from my perspective.

Note that we do backport critical and security fixes to the bundled pcre library. So although the version is seemingly not updated we do backport fixes.

Thanks for your detailed answers. I will look into reverting to bundled pcre for php81 but will still stay with system pcre for php82 and php83. I don't want to introduce this into php80 as we are removing this on end of the year.

I am also curious to know the reason for having the configure knob --with-external-pcre

I don't know the full history of this, but AFAIK this is the preferred way.

I'm also reading something in the FreeBSD issue report comments about other Apache issues. Especially this is an interesting remark: "From my experience, Apache2 recently crashes almost on any module including its own.".

I am not into apache and cannot comment on that but so far the issue started after we switched to ASLR on default installs. And specially with php opcache module and mod_php for apache httpd. I hope one day php will remove the entire support for mod_php as this does not make any sense after the introduction of php_fpm. One another interesting result has been building httpd with gcc instead of llvm/clang.

mariancerny commented 9 months ago

I am not into apache and cannot comment on that but so far the issue started after we switched to ASLR on default installs. And specially with php opcache module and mod_php for apache httpd. One another interesting result has been building httpd with gcc instead of llvm/clang.

Thanks to your additional info I now think this bug is "the same" bug that has been seen with apache on FreeBSD with ASLR. I am using FreeBSD 12.4 and ASLR is enabled by default only since FreeBSD 13.2. So I guess there is a pointer bug somewhere in apache or php that is causing the crash. When ASLR is not used, the pointer points to some valid memory. However when ASLR is used, the memory becomes invalid. Similar thing probably happens with external pcre (php code image is smaller in memory) and gcc (it compiles the code differently so the pointer might point to some valid memory, unless it's llvm bug, which I guess it's much less probable).

The core dumps I have seen crash when going from libphp.so to some other place, so I guess the bug is in PHP. I would suspect the list that is accessed by zend_llist_apply_with_del to be causing the problem. If the crash is reproducible, somebody with source code knowledge of php and apache should be able to track it down.

gioman commented 5 months ago

Observing the same on Ubuntu Server 22.04

apt-cache policy libapache2-mod-php8.1
libapache2-mod-php8.1:
  Installed: 8.1.2-1ubuntu2.14
  Candidate: 8.1.2-1ubuntu2.14
  Version table:
 *** 8.1.2-1ubuntu2.14 500
        500 https://mirror.hetzner.com/ubuntu/packages jammy-updates/main amd64 Packages
        500 https://mirror.hetzner.com/ubuntu/security jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     8.1.2-1ubuntu2 500
        500 https://mirror.hetzner.com/ubuntu/packages jammy/main amd64 Packages
nielsdos commented 5 months ago

@gioman do you have a reproducer? Otherwise it's near impossible to help.

gioman commented 5 months ago

do you have a reproducer? Otherwise it's near impossible to help.

@nielsdos thanks for the reply. My observation until now (a deeper analysis may be needed):

issuing graceful restart (manually or by letting logrotate do its thing) causes error.log to be flooded with

[Mon Apr 01 08:44:27.571587 2024] [core:notice] [pid 329780] AH00051: child pid 338330 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:28.573775 2024] [core:notice] [pid 329780] AH00051: child pid 338331 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:28.573822 2024] [core:notice] [pid 329780] AH00051: child pid 338335 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:28.573836 2024] [core:notice] [pid 329780] AH00051: child pid 338341 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:29.576074 2024] [core:notice] [pid 329780] AH00051: child pid 338347 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:30.578310 2024] [core:notice] [pid 329780] AH00051: child pid 338349 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:31.580621 2024] [core:notice] [pid 329780] AH00051: child pid 338352 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps
[Mon Apr 01 08:44:32.582878 2024] [core:notice] [pid 329780] AH00051: child pid 338354 exit signal Segmentation fault (11), possible coredump in /mnt/apache_core_dumps

until a systemctl restart apache2 is issued.

Once libapache2-mod-php81 is disabled the above does not happen anymore.

nielsdos commented 5 months ago

If you're able to get a backtrace from gdb for example that would be helpful, as there is still too little information do know what's going on.