Open nono-gdv opened 2 years ago
Hello @nono-gdv,
I just wanted to drop by and add that at Mollie, we are experiencing the exact same issue. But we use apache2
with mod_php
. We only have this problem during apache restarts (sudo service apache2 restart
).
To me, it looks like the interned strings cache is read while it is being written to.
ErrorException
include(/data/www/mo���D): Failed to open stream: No such file or directory
or
ErrorException
During class fetch: Uncaught ErrorException: include(/data/www/mollie.com/rele��
B): Failed to open stream: No such file or directory in /data/www/mollie.com/releases/[redacted]/vendor/composer/ClassLoader.php:571
Note that the cut off is a different places.
Our error reporting uses JSON as a transport so I am unable to provide the exact bytes following the valid characters.
We've tried increasing the interned strings buffer but to no avail.
Adding more context to what @willemstuursma reported (I work at the same company)...
We enabled verbose opcache
logging and observed a flood of Interned string buffer overflow
messages around the time of deployments. Setting the interned string buffer size to 64Mb (it was 32Mb) didn't change anything.
To make things more interesting, the tools we have in place to monitor opcache show a lot of free memory on that buffer (not even 50% is used).
@cmb69 @iluuu1994 is there anything we could provide to help on finding the root cause?
@lcobucci See: https://github.com/php/php-src/issues/9259 https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.interned-strings-buffer
The amount of memory used to store interned strings, in megabytes.
Is it possible you're setting the INI value to 32M
instead of 32
?
Error handling / logging was improved in https://github.com/php/php-src/pull/9260 but only for PHP 8.2.
We're setting it to 64
atm, which is correctly picked up by Datadog tooling.
Do you have a sufficiently large opcache.memory_consumption
?
I believe so (it's set to 256
).
Hi @cmb69 we have upgraded from PHP 8.1.9 to 8.1.10 and the problem went away.
Unforunately, I don't have definitive way to replicate this, but I'm seeing this issue on PHP 8.1.27.
I've noticed it twice in the past week, but seeing what happen, I think it's very likely it has happened before.
.php
to a class name. On one of many web servers, just after we deployed and called opcache_reset()
to flush the cache, the file cache got corrupted and started adding .pgp
instead. Calling opcache_reset()
resolved the issue.new ReflectionClass(MyClass::class))->getConstants()
had it without the "S".The opcache stats show memory_usage.free_memory
= 88 and interned_strings_usage.free_memory
= 0, so I plan to raise the memory amounts, but it seems like some type of memory corruption bug in PHP that when it is low, incorrect file contents are being cached.
I'm not sure if it makes a difference, but we use symlinks for our directories. On deployment, we change /var/www/oursite
, which is a symlink to point to a new directory.
I just noticed the same issue as I mentioned above in https://github.com/php/php-src/issues/8739#issuecomment-2096736467 (bullet 2). The same class had the issue of the last letter of the constant being removed, though it was a different constant. This time, it was a constant with name ACCOUNT_ALL
that opcache had as ACCOUNT_AL
.
This is on PHP 8.1.29. However, I already raised memory since last time and the stats from opcache_get_status
show that there was plenty of memory.
Array
(
[opcache_enabled] => 1
[cache_full] =>
[restart_pending] =>
[restart_in_progress] =>
[memory_usage] => Array
(
[used_memory] => 161684560
[free_memory] => 375186352
[wasted_memory] => 0
[current_wasted_percentage] => 0
)
[interned_strings_usage] => Array
(
[buffer_size] => 33554432
[used_memory] => 25416576
[free_memory] => 8137856
[number_of_strings] => 228762
)
[opcache_statistics] => Array
(
[num_cached_scripts] => 3996
[num_cached_keys] => 7673
[max_cached_keys] => 16229
[hits] => 1713543
[start_time] => 1717695939
[last_restart_time] => 1719851218
[oom_restarts] => 0
[hash_restarts] => 0
[manual_restarts] => 131
[misses] => 4627
[blacklist_misses] => 0
[blacklist_miss_ratio] => 0
[opcache_hit_rate] => 99.730701851388
)
)
Referring the bullet points from https://github.com/php/php-src/issues/8739#issuecomment-2096736467 As for 1., I found the same issue with opcache_reset causing all sorts of strange crashes and bugs, I would recommend avoiding that function in high traffic. Reference: https://github.com/php/php-src/issues/14471 Does bullet point 2. involve opcache_reset too?
Thanks, @nielsdos. Both involve using opcache_reset
. Our setup is that we have two directories and a symlink to one of them. When we do a deployment, we change the symlink so changes are atomic. It's something like the following:
# Assume web root is as `/var/www/site/` and is a symlink to `/var/www/site_a`.
cd /var/www
rsync -a --delete /code/to/deploy/ site_b/
ln -s site_b ~/site_new
chown -h apache:apache ~/site_new
sudo mv -Tf ~/site_new /var/www/site
After we've done that, we run php -r 'opcache_reset();'
to clear the CLI opcache and then curl -q -k https://127.0.0.1/path/to/clear/opcache.php
to clear the Apache opcache.
From what I've seen, it's the Apache version of opcache that gets corrupted.
What's the alternative to opcache_reset
? For Apache, I assume restarting apache, right? I'm not sure about the CLI. Unfortunately, it has been 8 years since we added the calls to opcache_reset
, so I can't remember for sure, but I think we needed to clear it because the symlinks would have issues and result in slowness and/or invalid cached data after a deployment.
I'll try to look into the opcache_reset issue tomorrow, but can't make any promises.
What's the alternative to opcache_reset? For Apache, I assume restarting apache, right?
Yes indeed, until the opcache_reset bug you experience is fixed this would be the way to go I think.
I'm not sure about the CLI.
I don't think you can easily force a reset here.
Thanks, @nielsdos. Fortunately, it's not all too often that it happens.
@nielsdos We've been getting this on a daily basis as well.
So far I've seen this only with PHP 8.1.
This is the case witih PHP 8.1.29
[03-Jul-2024 09:25:14 UTC] PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/block-supports/shadow.phpp' (include_path='.:/opt/remi/php81/root/usr/share/pear:/opt/remi/php81/root/usr/share/php:/usr/share/pear:/usr/share/php') in /path/to/public/wp-settings.php:372
So instead of php it would just cache phpp (an extra P at the end). The only solution to resolve the issue at that time is to reset OPcache or reload the PHP-FPM.
Here is one more case where it adds a unique character:
PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/sitemaps/class-wp-sitemaps-registry.php▒dler.php' (include_path='.:/opt/remi/php81/root/usr/share/pear:/opt/remi/php81/root/usr/share/php:/usr/share/pear:/usr/share/php') in /path/to/public/wp-settings.php:335
I can confirm the issue is not related to Remi PHP as we've experienced the same with the non-remi php build.
This ones a case with PHP 8.1.17.
PHP Fatal error: Uncaught Error: Failed opening required '/path/to/public/wp-includes/class-wp-oembed.phpphp' (include_path='.:/usr/local/lib/php') in /path/to/public/wp-settings.php:247
It is hard to replicate the issue which prevents us from gathering more information. It just happens randomly on its own when opcache_reset is called.
I'm working on figuring this out, https://github.com/php/php-src/issues/14471 is definitely related as both show the same symptoms. So far I figured out that zend_accel_hash_clean
and zend_accel_hash_update
execute concurrently in my case which causes problems. I can tell that because when I add memset(accel_hash->hash_entries, 0xf0, sizeof(zend_accel_hash_entry )*accel_hash->max_num_entries);
at the bottom of zend_accel_hash_clean
I get segfaults on address 0xf0f0f0f0f0f0f110, which is 0xf0f0f0f0f0f0f0f0 + 0x20 which is an indirect
member of _zend_accel_hash_entry
.
EDIT: or a stale reference is used there! I think that might be it. Although maybe not the only problem... :/
I identified two race conditions with requests firing and the restart code. I'm running a stress test now. Maybe there's more problems with more complex code, I'll have to check and potentially fix this in multiple PRs.
I identified a third issue but haven't fully managed to understand that one yet. It seems an op_array is already used while it's still being optimized, because the persistent script is already added to the hashtable. Posting this here for future me.
==1445800==ERROR: AddressSanitizer: heap-use-after-free on address 0x5110000ce6c0 at pc 0x5923ddea8dc1 bp 0x7ffc45e28280 sp 0x7ffc45e28270
READ of size 8 at 0x5110000ce6c0 thread T0
#0 0x5923ddea8dc0 in execute_ex /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:57343
#1 0x5923ddebe12f in zend_execute /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:63001
#2 0x5923de0037dc in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1907
#3 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
#4 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
#5 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
#6 0x7f20697bac87 (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#7 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#8 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)
0x5110000ce6c0 is located 0 bytes inside of 256-byte region [0x5110000ce6c0,0x5110000ce7c0)
freed by thread T0 here:
#0 0x7f2069cfb422 in free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x5923ddc27873 in __zend_free /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:3347
#2 0x5923ddc24ddc in _efree /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:2786
#3 0x5923ddb05a5c in assemble_code_blocks /run/media/niels/MoreData/php-src/Zend/Optimizer/block_pass.c:995
#4 0x5923ddb0e802 in zend_optimize_cfg /run/media/niels/MoreData/php-src/Zend/Optimizer/block_pass.c:1764
#5 0x5923ddc01aa7 in zend_optimize /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_optimizer.c:1072
#6 0x5923ddc04fb0 in zend_optimize_script /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_optimizer.c:1583
#7 0x7f2065a58a96 in cache_script_in_shared_memory /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:1565
#8 0x7f2065a5cb93 in persistent_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:2159
#9 0x5923de003767 in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1900
#10 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
#11 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
#12 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
#13 0x7f20697bac87 (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#14 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#15 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)
previously allocated by thread T0 here:
fscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscffscf #0 0x7f2069cfb6e2 in realloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x5923ddc27816 in __zend_realloc /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:3338
#2 0x5923ddc24ec1 in _erealloc /run/media/niels/MoreData/php-src/Zend/zend_alloc.c:2797
#3 0x5923ddfad898 in pass_two /run/media/niels/MoreData/php-src/Zend/zend_opcode.c:1060
#4 0x5923ddf55a8e in zend_compile Zend/zend_language_scanner.l:623
#5 0x5923ddf55e0e in compile_file Zend/zend_language_scanner.l:653
#6 0x5923dd676d79 in phar_compile_file /run/media/niels/MoreData/php-src/ext/phar/phar.c:3346
#7 0x7f2065a5a284 in opcache_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:1820
#8 0x7f2065a5cac9 in persistent_compile_file /run/media/niels/MoreData/php-src/ext/opcache/ZendAccelerator.c:2150
#9 0x5923de003767 in zend_execute_script /run/media/niels/MoreData/php-src/Zend/zend.c:1900
#10 0x5923dda7c838 in php_execute_script_ex /run/media/niels/MoreData/php-src/main/main.c:2529
#11 0x5923dda7cc40 in php_execute_script /run/media/niels/MoreData/php-src/main/main.c:2569
#12 0x5923de02f26a in main /run/media/niels/MoreData/php-src/sapi/fpm/fpm/fpm_main.c:1934
#13 0x7f20697bac87 (/usr/lib/libc.so.6+0x25c87) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#14 0x7f20697bad4b in __libc_start_main (/usr/lib/libc.so.6+0x25d4b) (BuildId: 32a656aa5562eece8c59a585f5eacd6cf5e2307b)
#15 0x5923dd203f54 in _start (/run/media/niels/MoreData/php-src/sapi/fpm/php-fpm+0x603f54) (BuildId: 5ae13f566af8a69d9255d4bb11f46fb52e058a40)
fscfSUMMARY: AddressSanitizer: heap-use-after-free /run/media/niels/MoreData/php-src/Zend/zend_vm_execute.h:57343 in execute_ex
fscffscffscffscffscfShadow bytes around the buggy address:
0x5110000ce400: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x5110000ce480: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x5110000ce500: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa
0x5110000ce580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x5110000ce600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x5110000ce680: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd
0x5110000ce700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x5110000ce780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
0x5110000ce800: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x5110000ce880: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x5110000ce900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1445800==ABORTING
Description
Hello,
We experience seemingly random in-memory corruption of code when deploying new versions of our PHP app on php-fpm 8.1.6 on Linux. Here is a sample error message:
This example happens in the class loader from composer. The beginning of the file name is correct, the end is obviously not.
Another example (another release on another server):
This bug is not easily reproducible, it happens in random-looking places in the code, sometimes but not always, on some servers but not others (we have 10 app servers, the bug appears on one server at a time, very rarely two, not on every release but more than half of them).
When it happens, it looks like there is only one corrupted chunk of data; depending on the affected data, it may only cause a few real errors or basically prevent the code from working on the server at all. It generates errors for a few minutes. Restarting the fpm daemon fixes it; reloading it or running
opcache_reset()
does not.We will try to reproduce it on a staging server; at the moment I can only say it appears under load and not under light testing.
It may or may not be related to #8731 (same app, same servers, seemingly fixed by downgrading to 8.0.19).
I know this report is not really helpful, but I'm at a loss how to make it more useful.
Downgrading to PHP 8.0.19 seems to fix the issue.
PHP Version
PHP 8.1.6 (cli) (built: May 17 2022 16:49:19) (NTS)
Operating System
Debian 10