php / php-src

The PHP Interpreter
https://www.php.net
Other
37.93k stars 7.72k forks source link

String corruption on deployments with php-fpm 8.1.6 #8739

Open nono-gdv opened 2 years ago

nono-gdv commented 2 years ago

Description

Hello,

We experience seemingly random in-memory corruption of code when deploying new versions of our PHP app on php-fpm 8.1.6 on Linux. Here is a sample error message:

include(/space/www/api2019/data/releases/20220607140425/vendor/la�R�F): Failed to open stream: No such file or directory

This example happens in the class loader from composer. The beginning of the file name is correct, the end is obviously not.

Another example (another release on another server):

Call to undefined function Symfony\Component\HttpFoundation\implode()

This bug is not easily reproducible, it happens in random-looking places in the code, sometimes but not always, on some servers but not others (we have 10 app servers, the bug appears on one server at a time, very rarely two, not on every release but more than half of them).

When it happens, it looks like there is only one corrupted chunk of data; depending on the affected data, it may only cause a few real errors or basically prevent the code from working on the server at all. It generates errors for a few minutes. Restarting the fpm daemon fixes it; reloading it or running opcache_reset() does not.

We will try to reproduce it on a staging server; at the moment I can only say it appears under load and not under light testing.

It may or may not be related to #8731 (same app, same servers, seemingly fixed by downgrading to 8.0.19).

I know this report is not really helpful, but I'm at a loss how to make it more useful.

Downgrading to PHP 8.0.19 seems to fix the issue.

PHP Version

PHP 8.1.6 (cli) (built: May 17 2022 16:49:19) (NTS)

Operating System

Debian 10

willemstuursma commented 2 years ago

Hello @nono-gdv,

I just wanted to drop by and add that at Mollie, we are experiencing the exact same issue. But we use apache2 with mod_php. We only have this problem during apache restarts (sudo service apache2 restart).

To me, it looks like the interned strings cache is read while it is being written to.

ErrorException
include(/data/www/mo���D): Failed to open stream: No such file or directory

or

ErrorException
During class fetch: Uncaught ErrorException: include(/data/www/mollie.com/rele��
B): Failed to open stream: No such file or directory in /data/www/mollie.com/releases/[redacted]/vendor/composer/ClassLoader.php:571

Note that the cut off is a different places.

Our error reporting uses JSON as a transport so I am unable to provide the exact bytes following the valid characters.

We've tried increasing the interned strings buffer but to no avail.

lcobucci commented 1 year ago

Adding more context to what @willemstuursma reported (I work at the same company)...

We enabled verbose opcache logging and observed a flood of Interned string buffer overflow messages around the time of deployments. Setting the interned string buffer size to 64Mb (it was 32Mb) didn't change anything.

To make things more interesting, the tools we have in place to monitor opcache show a lot of free memory on that buffer (not even 50% is used).

@cmb69 @iluuu1994 is there anything we could provide to help on finding the root cause?

iluuu1994 commented 1 year ago

@lcobucci See: https://github.com/php/php-src/issues/9259 https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.interned-strings-buffer

The amount of memory used to store interned strings, in megabytes.

Is it possible you're setting the INI value to 32M instead of 32?

Error handling / logging was improved in https://github.com/php/php-src/pull/9260 but only for PHP 8.2.

lcobucci commented 1 year ago

We're setting it to 64 atm, which is correctly picked up by Datadog tooling.

cmb69 commented 1 year ago

Do you have a sufficiently large opcache.memory_consumption?

lcobucci commented 1 year ago

I believe so (it's set to 256).

willemstuursma commented 1 year ago

Hi @cmb69 we have upgraded from PHP 8.1.9 to 8.1.10 and the problem went away.

ssigwart commented 4 months ago

Unforunately, I don't have definitive way to replicate this, but I'm seeing this issue on PHP 8.1.27.

I've noticed it twice in the past week, but seeing what happen, I think it's very likely it has happened before.

  1. In the first instance, we have a custom autoloader that adds .php to a class name. On one of many web servers, just after we deployed and called opcache_reset() to flush the cache, the file cache got corrupted and started adding .pgp instead. Calling opcache_reset() resolved the issue.
  2. In the second instance, we had a class constant that ended in "S". However, on one web server, opcache didn't include the "S" in the constant. I confirmed that the file on disk included the "S", but new ReflectionClass(MyClass::class))->getConstants() had it without the "S".

The opcache stats show memory_usage.free_memory = 88 and interned_strings_usage.free_memory = 0, so I plan to raise the memory amounts, but it seems like some type of memory corruption bug in PHP that when it is low, incorrect file contents are being cached.

ssigwart commented 4 months ago

I'm not sure if it makes a difference, but we use symlinks for our directories. On deployment, we change /var/www/oursite, which is a symlink to point to a new directory.

ssigwart commented 2 months ago

I just noticed the same issue as I mentioned above in https://github.com/php/php-src/issues/8739#issuecomment-2096736467 (bullet 2). The same class had the issue of the last letter of the constant being removed, though it was a different constant. This time, it was a constant with name ACCOUNT_ALL that opcache had as ACCOUNT_AL.

This is on PHP 8.1.29. However, I already raised memory since last time and the stats from opcache_get_status show that there was plenty of memory.

Array
(
    [opcache_enabled] => 1
    [cache_full] =>
    [restart_pending] =>
    [restart_in_progress] =>
    [memory_usage] => Array
        (
            [used_memory] => 161684560
            [free_memory] => 375186352
            [wasted_memory] => 0
            [current_wasted_percentage] => 0
        )

    [interned_strings_usage] => Array
        (
            [buffer_size] => 33554432
            [used_memory] => 25416576
            [free_memory] => 8137856
            [number_of_strings] => 228762
        )

    [opcache_statistics] => Array
        (
            [num_cached_scripts] => 3996
            [num_cached_keys] => 7673
            [max_cached_keys] => 16229
            [hits] => 1713543
            [start_time] => 1717695939
            [last_restart_time] => 1719851218
            [oom_restarts] => 0
            [hash_restarts] => 0
            [manual_restarts] => 131
            [misses] => 4627
            [blacklist_misses] => 0
            [blacklist_miss_ratio] => 0
            [opcache_hit_rate] => 99.730701851388
        )
)
nielsdos commented 2 months ago

Referring the bullet points from https://github.com/php/php-src/issues/8739#issuecomment-2096736467 As for 1., I found the same issue with opcache_reset causing all sorts of strange crashes and bugs, I would recommend avoiding that function in high traffic. Reference: https://github.com/php/php-src/issues/14471 Does bullet point 2. involve opcache_reset too?

ssigwart commented 2 months ago

Thanks, @nielsdos. Both involve using opcache_reset. Our setup is that we have two directories and a symlink to one of them. When we do a deployment, we change the symlink so changes are atomic. It's something like the following:

# Assume web root is as `/var/www/site/` and is a symlink to `/var/www/site_a`.
cd /var/www
rsync -a --delete /code/to/deploy/ site_b/
ln -s site_b ~/site_new
chown -h apache:apache ~/site_new
sudo mv -Tf ~/site_new /var/www/site

After we've done that, we run php -r 'opcache_reset();' to clear the CLI opcache and then curl -q -k https://127.0.0.1/path/to/clear/opcache.php to clear the Apache opcache.

From what I've seen, it's the Apache version of opcache that gets corrupted.

What's the alternative to opcache_reset? For Apache, I assume restarting apache, right? I'm not sure about the CLI. Unfortunately, it has been 8 years since we added the calls to opcache_reset, so I can't remember for sure, but I think we needed to clear it because the symlinks would have issues and result in slowness and/or invalid cached data after a deployment.

nielsdos commented 2 months ago

I'll try to look into the opcache_reset issue tomorrow, but can't make any promises.

What's the alternative to opcache_reset? For Apache, I assume restarting apache, right?

Yes indeed, until the opcache_reset bug you experience is fixed this would be the way to go I think.

I'm not sure about the CLI.

I don't think you can easily force a reset here.

ssigwart commented 2 months ago

Thanks, @nielsdos. Fortunately, it's not all too often that it happens.

bs-khan commented 2 months ago

@nielsdos We've been getting this on a daily basis as well.

So far I've seen this only with PHP 8.1.

This is the case witih PHP 8.1.29

[03-Jul-2024 09:25:14 UTC] PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/block-supports/shadow.phpp' (include_path='.:/opt/remi/php81/root/usr/share/pear:/opt/remi/php81/root/usr/share/php:/usr/share/pear:/usr/share/php') in /path/to/public/wp-settings.php:372

So instead of php it would just cache phpp (an extra P at the end). The only solution to resolve the issue at that time is to reset OPcache or reload the PHP-FPM.

Here is one more case where it adds a unique character:

PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/sitemaps/class-wp-sitemaps-registry.php▒dler.php' (include_path='.:/opt/remi/php81/root/usr/share/pear:/opt/remi/php81/root/usr/share/php:/usr/share/pear:/usr/share/php') in /path/to/public/wp-settings.php:335

I can confirm the issue is not related to Remi PHP as we've experienced the same with the non-remi php build.

This ones a case with PHP 8.1.17.

PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/class-wp-oembed.phpphp' (include_path='.:/usr/local/lib/php') in /path/to/public/wp-settings.php:247

It is hard to replicate the issue which prevents us from gathering more information. It just happens randomly on its own when opcache_reset is called.

nielsdos commented 2 months ago

I'm working on figuring this out, https://github.com/php/php-src/issues/14471 is definitely related as both show the same symptoms. So far I figured out that zend_accel_hash_clean and zend_accel_hash_update execute concurrently in my case which causes problems. I can tell that because when I add memset(accel_hash->hash_entries, 0xf0, sizeof(zend_accel_hash_entry )*accel_hash->max_num_entries); at the bottom of zend_accel_hash_clean I get segfaults on address 0xf0f0f0f0f0f0f110, which is 0xf0f0f0f0f0f0f0f0 + 0x20 which is an indirect member of _zend_accel_hash_entry. EDIT: or a stale reference is used there! I think that might be it. Although maybe not the only problem... :/

nielsdos commented 2 months ago

I identified two race conditions with requests firing and the restart code. I'm running a stress test now. Maybe there's more problems with more complex code, I'll have to check and potentially fix this in multiple PRs.

nielsdos commented 2 months ago

I identified a third issue but haven't fully managed to understand that one yet. It seems an op_array is already used while it's still being optimized, because the persistent script is already added to the hashtable. Posting this here for future me.

==1445800==ERROR: AddressSanitizer: heap-use-after-free on address 0x5110000ce6c0 at pc 0x5923ddea8dc1 bp 0x7ffc45e28280 sp 0x7ffc45e28270
READ of size 8 at 0x5110000ce6c0 thread T0
    #0 0x5923ddea8dc0 in execute_ex /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:57343
    #1 0x5923ddebe12f in zend_execute /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:63001
    #2 0x5923de0037dc in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1907
    #3 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
    #4 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
    #5 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
    #6 0x7f20697bac87  (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #7 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #8 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)

0x5110000ce6c0 is located 0 bytes inside of 256-byte region [0x5110000ce6c0,0x5110000ce7c0)
freed by thread T0 here:
    #0 0x7f2069cfb422 in free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
    #1 0x5923ddc27873 in __zend_free /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:3347
    #2 0x5923ddc24ddc in _efree /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:2786
    #3 0x5923ddb05a5c in assemble_code_blocks /run/media/niels/MoreData/php-src/Zend/Optimizer/block_pass.c:995
    #4 0x5923ddb0e802 in zend_optimize_cfg /run/media/niels/MoreData/php-src/Zend/Optimizer/block_pass.c:1764
    #5 0x5923ddc01aa7 in zend_optimize /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_optimizer.c:1072
    #6 0x5923ddc04fb0 in zend_optimize_script /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_optimizer.c:1583
    #7 0x7f2065a58a96 in cache_script_in_shared_memory /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:1565
    #8 0x7f2065a5cb93 in persistent_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:2159
    #9 0x5923de003767 in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1900
    #10 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
    #11 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
    #12 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
    #13 0x7f20697bac87  (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #14 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #15 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)

previously allocated by thread T0 here:
fscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscf    #0 0x7f2069cfb6e2 in realloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
    #1 0x5923ddc27816 in __zend_realloc /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:3338
    #2 0x5923ddc24ec1 in _erealloc /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:2797
    #3 0x5923ddfad898 in pass_two /run/media/niels/MoreData/php-src/Zend/zend_opcode.c:1060
    #4 0x5923ddf55a8e in zend_compile Zend/zend_language_scanner.l:623
    #5 0x5923ddf55e0e in compile_file Zend/zend_language_scanner.l:653
    #6 0x5923dd676d79 in phar_compile_file /run/media/niels/MoreData/php-src/ext/phar/phar.c:3346
    #7 0x7f2065a5a284 in opcache_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:1820
    #8 0x7f2065a5cac9 in persistent_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:2150
    #9 0x5923de003767 in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1900
    #10 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
    #11 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
    #12 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
    #13 0x7f20697bac87  (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #14 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
    #15 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)

fscfSUMMARY: AddressSanitizer: heap-use-after-free /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:57343 in execute_ex
fscffscffscffscffscfShadow bytes around the buggy address:
  0x5110000ce400: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x5110000ce480: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5110000ce500: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa
  0x5110000ce580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5110000ce600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x5110000ce680: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd
  0x5110000ce700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5110000ce780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x5110000ce800: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5110000ce880: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x5110000ce900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1445800==ABORTING