php / php-src

The PHP Interpreter
https://www.php.net
Other
37.79k stars 7.72k forks source link

JIT segmentation fault in PHP 8.1 #7817

Open cappadaan opened 2 years ago

cappadaan commented 2 years ago

Description

PHP 8.1.0 + 8.1.1 produces segfault, randomly. Downgrading to 8.0 solves the issue.

--core dump---

BFD: Warning: coredump-php-fpm.30267 is truncated: expected core file size >= 5413076992, found: 35983360. [New LWP 30267] [New LWP 1887] [New LWP 1886] [New LWP 1888] Cannot access memory at address 0x7f277dbb3128 Cannot access memory at address 0x7f277dbb3120 Failed to read a valid object file image from memory. Core was generated by `php-fpm: pool xxxxxx '.

Program terminated with signal 11, Segmentation fault.

0 0x000055bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137

10137 ce = CACHED_PTR(opline->op2.num); (gdb) bt Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x7ffdc940e3a8: (gdb) bt

0 0x000055bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137

Cannot access memory at address 0x7ffdc940e3a8 (gdb) frame 0

0 0x000055bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137

10137 ce = CACHED_PTR(opline->op2.num); (gdb) info frame Stack level 0, frame at 0x7ffdc940e3b0: rip = 0x55bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER (/usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137); saved rip Cannot access memory at address 0x7ffdc940e3a8

this is the only available info in the core dump.

PHP Version

PHP 8.1.0 + 8.1.1

Operating System

CentOS 7

cmb69 commented 2 years ago

I'm afraid this is not actionable without further information, since you didn't provide a reproduce script, and the stack backtrace only shows 1 frame. Since you claim that happens randomly, providing a reproduce script may not be possible, but at the very least we need more info, such as whether OPcache is enabled, whether OPcache JIT is enabled (tracing or function), and in which environment (SAPI) this happens, and also whether that happens only occasionally (and recovers afterwards or not), or happens frequently. Also, try to run with the latest PHP-8.1 development version, where some PHP 8.1.1 bugs have been fixed.

cappadaan commented 2 years ago
cmb69 commented 2 years ago

Thanks for the further info! Does it also happen when JIT is disabled (i.e. opcache.jit_buffer_size=0)? If not, what happens if you use a smaller jit_buffer_size (1GB seems very much, maybe try 16M or 64M).

cappadaan commented 2 years ago

I disabled JIT for now, will update in a few days.

cmb69 commented 2 years ago

Okay, I'll change back to feeback status (the ticket will be kept open for 2 weeks).

cappadaan commented 2 years ago

Disabling JIT seemed to solve the problem. So JIT is the cause of the segfault.

I have now enabled it and set the cache to 64M, will update later.

cappadaan commented 2 years ago

Again another crash with SIGSEGV after setting JIT to 64M. So definitely JIT causes this crash.

For now we leave it off there is some sort of fix (or newer php version).

If you need more info just let me know.

w3yyb commented 2 years ago

It's similar to the problem :https://bugs.php.net/bug.php?id=81664

cmb69 commented 2 years ago

Okay, so we now there is a randomly but frequently segfault if (tracing) JIT is enabled under FPM, which happens at:

#0 0x000055bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137
10137 ce = CACHED_PTR(opline->op2.num);

I'm afraid that is insufficient information to be actionable, and generally it's hard to fix a bug which is not reproducible. :(

meinemitternacht commented 2 years ago

We are also encountering this bug (or a very similar one) at my company. We use Yocto to build target images for X86_64, and PHP 8.1 is now producing segfaults with JIT enabled. What debugging options would be helpful here?

meinemitternacht commented 2 years ago

@cmb69 After some troubleshooting, it seems that our issue was due to the first flag, CPU optimizations. When enabled, CPUs with the avx instruction set, but not avx2 or avx512..., exhibited segfaults when running JIT. It may be worthwhile to see why this is occurring, but it would be more appropriate for us to open a separate issue at some point in the future.

cmb69 commented 2 years ago

@meinemitternacht, so you have segfaults with opcache.jit=1254, but not with opcache.jit=0254 on these machines?

meinemitternacht commented 2 years ago

@cmb69 That is correct.

After testing, this was determined to be incorrect.

meinemitternacht commented 2 years ago

@cmb69 I am doing some more tests on a different machine with a similar CPU instruction set and will get back with you soon.

cmb69 commented 2 years ago

@cappadaan, would opcache.jit=0254 work for you, i.e. no more segfaults? See https://github.com/php/php-src/issues/7817#issuecomment-1001630863 for details.

meinemitternacht commented 2 years ago

@cmb69 Still testing, but it seems that the CPU optimization flag was incorrect, it was the function vs tracing JIT option. The bug does not always appear, so getting false negatives is rather annoying.

meinemitternacht commented 2 years ago

@cmb69 OK, here are my results:

The following is a partial backtrace output from gdb, obtained in a previous test.

0 zend_fetch_ce_from_cache_slot (type=0x7f4f78c97170, cache_slot=0x8) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:980
1 zend_check_type_slow (is_internal=false, is_return_type=false, cache_slot=0x8, ref=0x0, arg=0x7f4fb6e146f0, type=0x7f4f78c97170)
at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1043
2 zend_check_user_type_slow (type=0x7f4f78c97170, arg=0x7f4fb6e146f0, ref=0x0, cache_slot=0x8, is_return_type=false)
at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1103

Debugging:

opcache.jit = 1254 # 0b11111111111100000000 opcache.jit_debug = 1048320

ZEND_JIT_DEBUG_ASM             0
ZEND_JIT_DEBUG_SSA             0
ZEND_JIT_DEBUG_REG_ALLOC       0
ZEND_JIT_DEBUG_ASM_STUBS       0
ZEND_JIT_DEBUG_PERF            0
ZEND_JIT_DEBUG_PERF_DUMP       0
ZEND_JIT_DEBUG_OPROFILE        0
ZEND_JIT_DEBUG_VTUNE           0
ZEND_JIT_DEBUG_GDB             1
ZEND_JIT_DEBUG_SIZE            1
ZEND_JIT_DEBUG_ASM_ADDR        1
ZEND_JIT_DEBUG_TRACE_START     1
ZEND_JIT_DEBUG_TRACE_STOP      1
ZEND_JIT_DEBUG_TRACE_COMPILED  1
ZEND_JIT_DEBUG_TRACE_EXIT      1
ZEND_JIT_DEBUG_TRACE_ABORT     1
ZEND_JIT_DEBUG_TRACE_BLACKLIST 1
ZEND_JIT_DEBUG_TRACE_BYTECODE  1
ZEND_JIT_DEBUG_TRACE_TSSA      1
ZEND_JIT_DEBUG_TRACE_EXIT_INFO 1

This leads to the following debug output:

[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 start (loop) FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:130"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0004 FE_FETCH_R V6 CV3($part) 0050 ; op1(packed array) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0005 INIT_NS_FCALL_BY_NAME 1 string("FastRoute\DataGenerator\is_string")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >init is_string"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0006 SEND_VAR_EX CV3($part) 1 ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0007 V7 = DO_FCALL_BY_NAME"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >call is_string"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0008 JMPZ V7 0015 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0015 T7 = QM_ASSIGN CV3($part) ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0016 V8 = FETCH_LIST_R T7 int(0) ; op1(packed array) val(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0017 ASSIGN CV4($varName) V8 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0018 V8 = FETCH_LIST_R T7 int(1) ; op1(packed array) val(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0019 ASSIGN CV5($regexPart) V8 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0020 FREE T7 ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0021 T7 = ISSET_ISEMPTY_DIM_OBJ (isset) CV2($variables) CV4($varName) ; op1(array) op2(string) val(undef)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0022 ;JMPZ T7 0031"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0031 INIT_METHOD_CALL 1 THIS string("regexHasCapturingGroups")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >init FastRoute\DataGenerator\RegexBasedAbstract::regexHasCapturingGroups"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0032 SEND_VAR CV5($regexPart) 1 ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0033 V7 = DO_FCALL"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >enter FastRoute\DataGenerator\RegexBasedAbstract::regexHasCapturingGroups"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0001  INIT_NS_FCALL_BY_NAME 2 string("FastRoute\DataGenerator\strpos")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "      >init strpos"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0002  SEND_VAR_EX CV0($regex) 1 ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0003  SEND_VAL_EX string("(") 2"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0004  V2 = DO_FCALL_BY_NAME"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "      >call strpos"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0005  T1 = TYPE_CHECK (false) V2 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0006  ;JMPZ T1 0008"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0007  RETURN bool(false)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     <back FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0034 JMPZ V7 0044 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0044 ASSIGN_DIM CV2($variables) CV4($varName) ; op1(array) op2(string) op3(string) val(undef)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0045 ;OP_DATA CV4($varName)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0046 T8 = CONCAT string("(") CV5($regexPart) ; op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0047 T7 = FAST_CONCAT T8 string(")") ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0048 ASSIGN_OP (CONCAT) CV1($regex) T7 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0049 JMP 0004"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 stop (loop)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 already prcessed"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     TRACE 3 exit 2 FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:130"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     TRACE 3 exit 4 FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:131"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 exited on signal 11 (SIGSEGV) after 8.303799 seconds from start

So, it seems that some of the regex compilation in FastRoute is causing issues with the JIT, at least on my machine(s). Do you have any pointers on what debugging options or tests which would help move this issue forward?

nikic commented 2 years ago

Maybe @dstogov has suggestions for debugging.

In your trace, the cache_slot=0x8 is likely what ultimately causes the crash, but it's not visible where it originates.

cappadaan commented 2 years ago

@cmb69 I also tested with this config:

opcache.jit_buffer_size=128M; opcache.jit = function;

No crashes. Let me know if I can help to somehow debug this for you.

dstogov commented 2 years ago

@meinemitternacht can you provide an instruction how to reproduce the crash with FastRoute?

meinemitternacht commented 2 years ago

@dstogov Troubleshooting this issue is quite tedious.

root@v180:~# cat /var/log/php_log |grep 10642
[28-Dec-2021 11:43:51] NOTICE: [pool vtl] child 10642 started
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "---- TRACE 36 start (loop) Composer\Autoload\ClassLoader::findFileWithExtension() /api/vendor/composer/ClassLoader.php:501"
  -- snip --
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "0045 RETURN CV9($file) ; op1(string)"
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "---- TRACE 41 abort (exit from loop)"
[28-Dec-2021 11:46:17] WARNING: [pool vtl] child 10642 exited on signal 11 (SIGSEGV) after 145.578650 seconds from start
root@v180:~# cat /var/log/php_log |grep 10643 
[28-Dec-2021 11:43:51] NOTICE: [pool vtl] child 10643 started
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10643 exited on signal 11 (SIGSEGV) after 141.926396 seconds from start

It seems that JIT debug output immediately preceding a segfault has no bearing on if it actually caused the segfault or not. In this case, no debug output was gathered while the child was alive (but other FPM processes did produce output). However, with JIT tracing turned off, I never receive a segfault from PHP-FPM.

This is a self-contained build environment (Yocto), and all target machines share the same architecture (X86_64). I am going to attempt to generate some useful core files, or to run php-fpm under gdb. Do you have any suggestions to make this process easier? Are there compile-time flags that I can turn on, or extra debugging options that I can enable?

How does ZEND_JIT_DEBUG_GDB affect debug output?

cmb69 commented 2 years ago

Maybe https://wiki.php.net/rfc/jit#jit_debugging helps a bit. :)

dstogov commented 2 years ago

@meinemitternacht I mean, I need to run the same PHP application, if this is possible. The smaller the app - the better.

meinemitternacht commented 2 years ago

@cmb69 @dstogov Is there an easy way to run a PHP FPM process under gdb? Currently, that is the only method I have for reproducing my problem since it does not occur with the CLI. I am still attempting to narrow the test case as well.

cmb69 commented 2 years ago

I have no idea about debugging FPM, but debugging JIT issues is generally very hard, and it likely makes more sense to look for a way to reproduce the segfault, and provide that to Dmitry. Also try with current PHP-8.1 if possible; there have been several JIT related fixes since 8.1.1.

meinemitternacht commented 2 years ago

@cmb69 I am currently attempting to attach gdb to a running FPM process, but I am encountering bugs in gdb... of all places.

Yes, I would like to provide a concise test case for him, but the bug is intermittent and I have no idea what code is actually triggering it.

dstogov commented 2 years ago

@meinemitternacht some of your log above shows a crash after compiling just 3 traces. It shouldn't be very hard to me to analyse this, if I reproduce. I suspect the problem may be caused by some race condition. Or we may get crashes because of few different problems.

meinemitternacht commented 2 years ago

@dstogov Can you provide a short explanation of how the shared memory space works with JIT? I usually see the problem occur after a fresh reboot, but it still does not happen every time. My first goal is to get the issue to happen consistently then I can narrow down the code that is causing the segfault.

Basically, my question is, what is the difference between an OS reboot and restarting php-fpm? Does the OS cache libraries, or should I watch out for shared memory between the PHP CLI and php-fpm? I am using the CLI opcache.

meinemitternacht commented 2 years ago

I can continue testing tomorrow at work.

dstogov commented 2 years ago

@meinemitternacht the fact that the problem doesn't happen consistently, very probably tells about some race condition. e.g. one process is writing something into shared memory (and somehow gets into inconsistent state), at the same time anther process reads from SHM and somehow fails because of inconsistency. The failure occurs only because of luck, when both processes gets into specific states. I think, there is no difference between fresh OS reboot, or PHP-FPM restart (both recreate SHM). PHP CLI doesn't share memory with PHP-FPM.

meinemitternacht commented 2 years ago

@dstogov I have GDB attached to a php-fpm process and reproduced the segfault.

(gdb) bt
#0  zend_check_type_slow (is_internal=false, is_return_type=false, cache_slot=0x8, ref=0x0, arg=0x7f6249014590, type=0x7f620af3d470)
    at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1043
#1  zend_check_user_type_slow (type=0x7f620af3d470, arg=0x7f6249014590, ref=0x0, cache_slot=0x8, is_return_type=false)
    at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1103
#2  0x00007f6249289e9a in zend_jit_verify_arg_slow (arg=0x7f6249014590, arg_info=0x7f620af3d468) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c:1467
#3  0x00007f6228cae736 in ?? ()
#4  0x00007f624907b240 in ?? ()
#5  0x00007f620af298d8 in ?? ()
#6  0x0000000000000000 in ?? ()

======

#2  0x00007f6249289e9a in zend_jit_verify_arg_slow (arg=0x7f6249014590, arg_info=0x7f620af3d468) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c:1467
1467    in /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c
(gdb) info args
arg = 0x7f6249014590
arg_info = 0x7f620af3d468

======

(gdb) print *arg
$9 = {
  value = {
    lval = 140060108612160,
    dval = 6.9198888018061968e-310,
    counted = 0x7f6249056240,
    str = 0x7f6249056240,
    arr = 0x7f6249056240,
    obj = 0x7f6249056240,
    res = 0x7f6249056240,
    ref = 0x7f6249056240,
    ast = 0x7f6249056240,
    zv = 0x7f6249056240,
    ptr = 0x7f6249056240,
    ce = 0x7f6249056240,
    func = 0x7f6249056240,
    ww = {
      w1 = 1225089600,
      w2 = 32610
    }
  },
  u1 = {
    type_info = 776,
    v = {
      type = 8 '\b',
      type_flags = 3 '\003',
      u = {
        extra = 0
      }
    }
  },
  u2 = {
    next = 0,
    cache_slot = 0,
    opline_num = 0,
    lineno = 0,
    num_args = 0,
    fe_pos = 0,
    fe_iter_idx = 0,
    property_guard = 0,
    constant_flags = 0,
    extra = 0
  }
}

======

(gdb) print *arg_info
$10 = {
  name = 0x7f620950bd80,
  type = {
    ptr = 0x7f620950d350,
    type_mask = 16777216
  },
  default_value = 0x6e6f6974696e00
}

======

(gdb) print *arg_info.name
$11 = {
  gc = {
    refcount = 2,
    u = {
      type_info = 342
    }
  },
  h = 17469586239392435662,
  len = 10,
  val = "d"
}

What is *arg_info.name.val ? Is that the name of a PHP variable, or something internal to the Zend engine or JIT?

meinemitternacht commented 2 years ago

After looking at a few things, it looks like it is getting hung up with the PHP-DI definitions.

(gdb) print /s (char *)(*arg).value.obj.ce.__tostring.op_array.filename.val   
$48 = 0x7f620af47980 "/api/vendor/php-di/php-di/src/Definition/FactoryDefinition.php"
(gdb) print /s (char *)(*arg_info).name.val          
$52 = 0x7f620950bd98 "definition"
meinemitternacht commented 2 years ago

@dstogov After blacklisting /api/vendor/php-di/* from opcache, I can no longer reproduce the segfaults.

I have multiple projects running on these machines, each with their own vendor directory. Is it possible that having two different versions of PHP-DI in the same FPM address space is causing the JIT to become confused?

In one project, I am using PHP-DI version 5.4.6, and in the main project I am using 6.3.4.

meinemitternacht commented 2 years ago

@dstogov If I run separate php-fpm pools, would JIT'd code be shared between them, or is each pool separate?

cmb69 commented 2 years ago

All FPM pools share the same OPcache instance.

meinemitternacht commented 2 years ago

@cmb69 Ouch. Well, I will do some more testing on Monday. It's either a problem with PHP-DI in this environment, or it's because I have two separate, but similar, versions in OPcache.

cmb69 commented 2 years ago

Ouch.

See https://bugs.php.net/bug.php?id=81704#1640205875 for details. :)

DevSysEngineer commented 2 years ago

I think that I have the same issue as the actor. We have huge code base that is based on PHP-DI / FastRoute. Every time when I deploy a new version of our code base, our PHP FPM serivce will fail and creating a lot of logs with segfault fails. When I reload my PHP FPM service, the errors are gone: systemctl reload php8.0-fpm. It seems that cache is not being cleared properly? When I disabled JIT, the segfault failed are gone after deploying new version.

I have this issue since PHP 8.0. I also run a instance with PHP 8.1 and we see their the same issues. May 27 12:56:38 manager1 kernel: php-fpm8.0[22682]: segfault at 55f27089 ip 0000000055f27089 sp 00007ffecd55b598 error 14 in php-fpm8.0[55f2707ab000+d1000]

I tried to create a test code to found out where the issue comes from, but has not succeeded yet.

drealecs commented 2 years ago

How are you deploying the new code base versions? just git checkout? One solution for you would be to make sure a new instance of fpm is started for the new codebase. That's usual with docker. Another one that might work as well is to use the directory symlink replace strategy, basically replacing the whole codebase with the new codebase. The realpath will be resolved and handled as new entries.

DevSysEngineer commented 2 years ago

@drealecs Yes, just a simple git checkout. Only changes files will be replaced.

dstogov commented 2 years ago

What is *arg_info.name.val ? Is that the name of a PHP variable, or something internal to the Zend engine or JIT?

you may print zend string value through print (char*)arg_info.name.val

aidas-emersoft commented 2 years ago

We're having a very similar issue. Been running a number of sites on PHP-FPM 8.0.10 with the following JIT config:

opcache.enable=1 opcache.jit=1255 opcache.jit_buffer_size=128M

Once switched to PHP-FPM 8.1.0 (and 8.1.1) sites randomly started crashing producing segfault errors. I've now disabled JIT hoping this will temporarily fix it.

meinemitternacht commented 2 years ago

@aidas-emersoft Do your sites happen to utilize the PHP-DI project? https://php-di.org/

In our case, we see segfaults under the tracing JIT when we do not blacklist that vendor directory. When it is excluded from opcache, no segfaults occur.

Using the function JIT option (opcache.jit = 1205), no segfaults are produced either.

aidas-emersoft commented 2 years ago

@meinemitternacht - no, we don't use PHP-DI. They are full stack Symfony projects. I will try opcache.jit = 1205 setting.

No crashes have happened since completely disabling JIT 12 hours ago.

cappadaan commented 2 years ago

OPcache: Fixed bug #81679 (Tracing JIT crashes on reattaching).

This change in 8.1.2 does not fix this issue.

meinemitternacht commented 2 years ago

@cappadaan I don't think that fix was for this issue. It was for CGI on windows IIRC.

deluxetom commented 2 years ago

I'm experiencing the same issue with JIT enabled, PHP 8.1.2

fulltext:SIGSEGV log_string="[03-Feb-2022 00:26:30] WARNING: [pool www] child 161 exited on signal 11 (SIGSEGV) after 119.953665 seconds from start"

everything goes back to normal with JIT disabled

kohlerdominik commented 2 years ago

Just found the same issue after updating to PHP8.1 with JIT enabled. So far it looks like changing from opcache.jit=1235 to opcache.jit=1205 resolves the issue.

We use a laravel app with a lot of symfony components, but no PHP-DI. Here our composer dependency, maybe some other contributors can make out similarities:

Click here to show `composer.json` ``` "php": "^8.1", "ext-dom": "*", "ext-fileinfo": "*", "ext-json": "*", "ext-mbstring": "*", "ext-redis": "*", "ext-simplexml": "*", "absszero/laravel-stackdriver-error-reporting": "^1.6", "askedio/laravel-soft-cascade": "^8.1", "brick/money": "^0.5", "ezyang/htmlpurifier": "^4.13", "fideloper/proxy": "^4.4", "fruitcake/laravel-cors": "^2.0", "galbar/jsonpath": "^2.0", "guzzlehttp/guzzle": "^7.4", "inspheric/nova-indicator-field": "^1.43", "intervention/validation": "^3.0", "justinrainbow/json-schema": "^5.2", "kalnoy/nestedset": "^6.0", "kkomelin/laravel-translatable-string-exporter": "^1.12", "laravel/framework": "^8.77", "laravel/horizon": "^5.7", "laravel/nova": "~3.25", "laravel/passport": "^10.1", "laravel/telescope": "^4.4", "laravel/tinker": "^2.6", "laravel/ui": "^3.4", "league/fractal": "^0.19", "lucid-arch/laravel-foundation": "^8.0", "maatwebsite/excel": "^3.1", "maennchen/zipstream-php": "^2.1", "mikehaertl/php-tmpfile": "dev-feature/keep-file-after-unreferencing", "mossadal/math-parser": "^1.3", "mustache/mustache": "^2.13", "prettus/l5-repository": "^2.7", "s-ichikawa/laravel-sendgrid-driver": "^3.0", "spatie/laravel-translatable": "^5.1", "sprain/swiss-qr-bill": "v4.0", "staudenmeir/belongs-to-through": "^2.11", "superbalist/flysystem-google-storage": "dev-master as 7.2.3", "superbalist/laravel-google-cloud-storage": "^2.2", "symfony/intl": "^5.0", "tedivm/jshrink": "1.4.0", "veelasky/laravel-hashid": "^2.2", "vmitchell85/nova-links": "^1.0" ```
meinemitternacht commented 2 years ago

@dstogov I am encountering a similar problem on a virtual machine I am using for development (also 8.1).

If I have a fresh instance of PHP-FPM, it works fine. However, as soon as I overwrite one of the project source files with changes during development, JIT will immediately cause a segfault. If I then restart PHP-FPM, it will begin working again.

It seems to occur regardless of what I change in the file (sometimes it is even just text within a string).

dstogov commented 2 years ago

I cannot fix this before I get a way to reproduce the crash.