Closed Appla closed 2 months ago
After more dig this part may be the root cause https://github.com/php/php-src/blob/PHP-8.3/ext/opcache/jit/zend_jit_trace.c#L8331-L8353 https://github.com/php/php-src/blob/PHP-8.3/ext/opcache/jit/zend_jit_x86.dasc#L9465:L9473
/cc @dstogov
I don't see any problems running the provided example. It records and executes just a single trace.
It seems, in your environment the trace is invalidated (for some reason) then recorded and executed again. Do you regenerate code for some PHP script or touch it (update its mtime)? Then this is the expected behavoir.
Try to run the code with -dopcache.jit_debug=0x1ff105
to collect more info...
I don't see any problems running the provided example. It records and executes just a single trace.
It seems, in your environment the trace is invalidated (for some reason) then recorded and executed again. Do you regenerate code for some PHP script or touch it (update its mtime)? Then this is the expected behavoir.
yes, the file may has changed during execution also the file mtime
Try to run the code with
-dopcache.jit_debug=0x1ff105
to collect more info...
I was set jit_debug=0x1fffff in GDB and this part are hit
It looks like this code are hit in zend_jit_trace_exit
and continue to execution original opline but not switch to interpreter and again and again
if (ZEND_JIT_TRACE_NUM >= JIT_G(max_root_traces)) {
/* skip: too many root traces */
}
the process still running with GDB attached, any other info I can provided?
yes, the file may has changed during execution also the file mtime
In case some of the files is changed on each "request", the trace is going to be invalidated and recompiled on each "request". It's better to add this file in the opcache blacklist.
I understood, the problem occurs only after the "max_root_traces" limit is exceeded. Right? The number of compiled traces is not caused by a bug. It's caused by trace invalidation. Right?
It would be great if you could construct a test case. This way I will able to fix the exact problem.
yes, the file may has changed during execution also the file mtime
In case some of the files is changed on each "request", the trace is going to be invalidated and recompiled on each "request". It's better to add this file in the opcache blacklist.
My usage scenario involves a specific PHP script managing the execution of other PHP scripts, ensuring that the updated script content is used immediately upon file modification. I am unsure whether JIT is necessary for this type of requirement.
I understood, the problem occurs only after the "max_root_traces" limit is exceeded. Right? The number of compiled traces is not caused by a bug. It's caused by trace invalidation. Right?
Please disregard my speculation, I am not familiar with the implementation of this part.
It would be great if you could construct a test case. This way I will able to fix the exact problem.
This is a script running in our test environment. I'm not sure if it can be isolated for reproduction. I'll try to see. btw: this is the only script out of many that consistently trigger this issue.
My usage scenario involves a specific PHP script managing the execution of other PHP scripts, ensuring that the updated script content is used immediately upon file modification. I am unsure whether JIT is necessary for this type of requirement.
It's better to disable not only JIT, but also opcache, because your invalidated scripts re-cached again and again. Instead of complete disabling you may blacklist specific files.
Please disregard my speculation, I am not familiar with the implementation of this part.
I just trying to interpret your reference to the code: if (ZEND_JIT_TRACE_NUM >= JIT_G(max_root_traces)) {
, but without the reproduction I may only guess.
My usage scenario involves a specific PHP script managing the execution of other PHP scripts, ensuring that the updated script content is used immediately upon file modification. I am unsure whether JIT is necessary for this type of requirement.
It's better to disable not only JIT, but also opcache, because your invalidated scripts re-cached again and again. Instead of complete disabling you may blacklist specific files.
I have a large number of scripts to execute, and script updates are relatively infrequent. Would disabling opcache have any performance or anything else implications?
Please disregard my speculation, I am not familiar with the implementation of this part.
I just trying to interpret your reference to the code:
if (ZEND_JIT_TRACE_NUM >= JIT_G(max_root_traces)) {
, but without the reproduction I may only guess.
I've tried isolating the relevant code, but the issue hasn't been reproduced in a short period. The affected script has been running the management process for over a week. Apart from reproducing the script, is there any information that can be obtained from the running process to help troubleshoot this issue? I can provide this kind information for now.
@dstogov no idea how you came to the conclusion that provided snippet works (with exact clone
of user's env
) and TBH I dont really wanna know..........
I don't see any problems running the provided example. It records and executes just a single trace.
It seems, in your environment the trace is invalidated (for some reason) then recorded and executed again. Do you regenerate code for some PHP script or touch it (update its mtime)? Then this is the expected behavoir.
Try to run the code with
-dopcache.jit_debug=0x1ff105
to collect more info...
The issue was reproduced after running with the -dopcache.jit_debug=0x1ff105 option for a few days.
after TRACE 1023 (reach max_root_traces?), the issue happened as above described
---- TRACE 1020 TSSA start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 #0.V0 [ref, rc1, rcn, any] = DO_FCALL_BY_NAME
>enter now
---- TRACE 1020 TSSA stop (link to 515)
TRACE-1020$core\Context::resetTraceCtx$119: ; (unknown)
mov $0x3fc, EG(jit_trace_num)
mov 0x40(%r14), %rdx
mov 0x8(%rdx), %rax
cmp $0x7b7a960, %rax
jnz .L2
.L1:
mov EG(vm_stack_top), %r15
mov EG(vm_stack_end), %rdx
sub %r15, %rdx
cmp $0x60, %rdx
jb jit$$trace_exit_1
add $0x60, EG(vm_stack_top)
mov $0x0, 0x28(%r15)
mov %rax, 0x18(%r15)
mov $0x0, 0x20(%r15)
mov $0x0, 0x2c(%r15)
mov $0x6f12a08, (%r14)
mov %r14, 0x30(%r15)
mov $0x0, 0x8(%r15)
lea 0x50(%r14), %rdx
mov %rdx, 0x10(%r15)
mov CG(map_ptr_base), %rdx
mov 0x237b1(%rdx), %rdx
mov %rdx, 0x40(%r15)
mov %r15, EG(current_execute_data)
mov %r15, %r14
jmp 0x13aa66f0
.L2:
mov $0x6f12958, %rdi
lea 0x8(%rdx), %rsi
mov $zend_jit_find_ns_func_helper, %rax
call *%rax
cmp $0x7b7a960, %rax
jz .L1
jmp jit$$trace_exit_0
---- TRACE 1020 exit info
exit_0: 0000/----/0
exit_1: 0000/----/0/VM
---- TRACE 1020 compiled
TRACE 1020 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
---- TRACE 1021 start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 V0 = DO_FCALL_BY_NAME
>enter now
---- TRACE 1021 stop (link to 396)
---- TRACE 1021 TSSA start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 #0.V0 [ref, rc1, rcn, any] = DO_FCALL_BY_NAME
>enter now
---- TRACE 1021 TSSA stop (link to 396)
TRACE-1021$core\Context::resetTraceCtx$119: ; (unknown)
mov $0x3fd, EG(jit_trace_num)
mov 0x40(%r14), %rdx
mov 0x8(%rdx), %rax
cmp $0x7507420, %rax
jnz .L2
.L1:
mov EG(vm_stack_top), %r15
mov EG(vm_stack_end), %rdx
sub %r15, %rdx
cmp $0x60, %rdx
jb jit$$trace_exit_1
add $0x60, EG(vm_stack_top)
mov $0x0, 0x28(%r15)
mov %rax, 0x18(%r15)
mov $0x0, 0x20(%r15)
mov $0x0, 0x2c(%r15)
mov $0x6f12a08, (%r14)
mov %r14, 0x30(%r15)
mov $0x0, 0x8(%r15)
lea 0x50(%r14), %rdx
mov %rdx, 0x10(%r15)
mov CG(map_ptr_base), %rdx
mov 0x20639(%rdx), %rdx
mov %rdx, 0x40(%r15)
mov %r15, EG(current_execute_data)
mov %r15, %r14
jmp 0x13a7eab0
.L2:
mov $0x6f12958, %rdi
lea 0x8(%rdx), %rsi
mov $zend_jit_find_ns_func_helper, %rax
call *%rax
cmp $0x7507420, %rax
jz .L1
jmp jit$$trace_exit_0
---- TRACE 1021 exit info
exit_0: 0000/----/0
exit_1: 0000/----/0/VM
---- TRACE 1021 compiled
TRACE 1021 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
---- TRACE 1022 start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 V0 = DO_FCALL_BY_NAME
>enter now
---- TRACE 1022 stop (link to 515)
---- TRACE 1022 TSSA start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 #0.V0 [ref, rc1, rcn, any] = DO_FCALL_BY_NAME
>enter now
---- TRACE 1022 TSSA stop (link to 515)
TRACE-1022$core\Context::resetTraceCtx$119: ; (unknown)
mov $0x3fe, EG(jit_trace_num)
mov 0x40(%r14), %rdx
mov 0x8(%rdx), %rax
cmp $0x7b7a960, %rax
jnz .L2
.L1:
mov EG(vm_stack_top), %r15
mov EG(vm_stack_end), %rdx
sub %r15, %rdx
cmp $0x60, %rdx
jb jit$$trace_exit_1
add $0x60, EG(vm_stack_top)
mov $0x0, 0x28(%r15)
mov %rax, 0x18(%r15)
mov $0x0, 0x20(%r15)
mov $0x0, 0x2c(%r15)
mov $0x6f12a08, (%r14)
mov %r14, 0x30(%r15)
mov $0x0, 0x8(%r15)
lea 0x50(%r14), %rdx
mov %rdx, 0x10(%r15)
mov CG(map_ptr_base), %rdx
mov 0x237b1(%rdx), %rdx
mov %rdx, 0x40(%r15)
mov %r15, EG(current_execute_data)
mov %r15, %r14
jmp TRACE-515$now$55
.L2:
mov $0x6f12958, %rdi
lea 0x8(%rdx), %rsi
mov $zend_jit_find_ns_func_helper, %rax
call *%rax
cmp $0x7b7a960, %rax
jz .L1
jmp jit$$trace_exit_0
---- TRACE 1022 exit info
exit_0: 0000/----/0
exit_1: 0000/----/0/VM
---- TRACE 1022 compiled
TRACE 1022 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
---- TRACE 1023 start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 V0 = DO_FCALL_BY_NAME
>enter now
---- TRACE 1023 stop (link to 396)
---- TRACE 1023 TSSA start (enter) core\Context::resetTraceCtx() /path/to/class/Context.php:119
0000 INIT_NS_FCALL_BY_NAME 0 string("core\\now")
>init now
0001 #0.V0 [ref, rc1, rcn, any] = DO_FCALL_BY_NAME
>enter now
---- TRACE 1023 TSSA stop (link to 396)
TRACE-1023$core\Context::resetTraceCtx$119: ; (unknown)
mov $0x3ff, EG(jit_trace_num)
mov 0x40(%r14), %rdx
mov 0x8(%rdx), %rax
cmp $0x7507420, %rax
jnz .L2
.L1:
mov EG(vm_stack_top), %r15
mov EG(vm_stack_end), %rdx
sub %r15, %rdx
cmp $0x60, %rdx
jb jit$$trace_exit_1
add $0x60, EG(vm_stack_top)
mov $0x0, 0x28(%r15)
mov %rax, 0x18(%r15)
mov $0x0, 0x20(%r15)
mov $0x0, 0x2c(%r15)
mov $0x6f12a08, (%r14)
mov %r14, 0x30(%r15)
mov $0x0, 0x8(%r15)
lea 0x50(%r14), %rdx
mov %rdx, 0x10(%r15)
mov CG(map_ptr_base), %rdx
mov 0x20639(%rdx), %rdx
mov %rdx, 0x40(%r15)
mov %r15, EG(current_execute_data)
mov %r15, %r14
jmp 0x13a7eab0
.L2:
mov $0x6f12958, %rdi
lea 0x8(%rdx), %rsi
mov $zend_jit_find_ns_func_helper, %rax
call *%rax
cmp $0x7507420, %rax
jz .L1
jmp jit$$trace_exit_0
---- TRACE 1023 exit info
exit_0: 0000/----/0
exit_1: 0000/----/0/VM
---- TRACE 1023 compiled
// after above, The debug output is basically all the same.
.........
TRACE 1023 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
TRACE 1023 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
TRACE 1023 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
TRACE 1023 exit 0 core\Context::resetTraceCtx() /path/to/class/Context.php:119
.........
Is anything else needed?
I hope https://github.com/php/php-src/pull/14558 should fix this @Appla it would be great if you could test the patch
@dstogov for me your patch works well.
I hope #14558 should fix this @Appla it would be great if you could test the patch
I will feedback later.
I also have a question: what specific conditions could cause this issue with simple code? I am sure the file and mtime not changed during execution.
@Appla without a reproduction, I can only guess about your case.
The patch fixes a case when a root trace, started from INIT_*FCALL*
instruction, is "invalidated" after "jit_max_root_trace" limit is exceeded.
Description
I noticed that a PHP script was consuming a very high amount of CPU. I used strace to investigate and found that it was repeatedly executing fcntl. I used GDB to trace the process and found that it was continuously looping through a specific code path, as detailed in the following quote.
One scenario is that our script is hosted under a PHP process, where the host process forks a child process to execute the specific code.
0x000000001300042a ? add $0x100,%rsp 0x0000000013000431 ? test %eax,%eax 0x0000000013000433 ? jne 0x13000443 0x0000000013000435 ? mov 0x1a28f28,%r14 0x000000001300043d ? mov (%r14),%r15 0x0000000013000440 ? jmpq (%r15) -> 0x0000000013000443 ? jl 0x13000330 0x0000000013000449 ? mov 0x1a28f28,%r14 0x0000000013000451 ? mov (%r14),%r15 0x0000000013000454 ? cmpb $0x0,0x1a28f56 0x000000001300045c ? jne 0x13000000 0x0000000013000462 ? mov 0x18(%r14),%rax 0x0000000013000466 ? mov 0xc0(%rax),%rax 0x000000001300046d ? mov 0xa0(%rax),%rax 0x0000000013000474 ? jmpq (%r15,%rax,1)
0x00000000130af820 ? movl $0x3ff,0x1a28f38 0x00000000130af82b ? mov 0x40(%r14),%rdx 0x00000000130af82f ? mov 0x8(%rdx),%rax 0x00000000130af833 ? cmp $0x54306e0,%rax 0x00000000130af83a ? jne 0x130af8c0 -> 0x00000000130af840 ? mov 0x1a28f08,%r15 0x00000000130af848 ? mov 0x1a28f10,%rdx 0x00000000130af850 ? sub %r15,%rdx 0x00000000130af853 ? cmp $0x60,%rdx 0x00000000130af857 ? jb 0x13000904 0x00000000130af85d ? addq $0x60,0x1a28f08 0x00000000130af866 ? movl $0x0,0x28(%r15) 0x00000000130af86e ? mov %rax,0x18(%r15)
0x00000000130af8c0 ? mov $0x4fb4318,%rdi 0x00000000130af8c7 ? lea 0x8(%rdx),%rsi 0x00000000130af8cb ? movabs $0x7f3d4aa7e610,%rax // static zend_function ZEND_FASTCALL zend_jit_find_ns_func_helper(zval func_name, void **cache_slot)
0x00000000130af8d5 ? callq *%rax // return back 0x00000000130af8d7 ? cmp $0x54306e0,%rax 0x00000000130af8de ? je 0x130af840 0x00000000130af8e4 ? jmpq 0x13000900 -> 0x00000000130af8e9 ? add %al,(%rax) 0x00000000130af8eb ? add %al,(%rax) 0x00000000130af8ed ? add %al,(%rax) 0x00000000130af8ef ? add %cl,-0x39(%rcx) 0x00000000130af8f2 ? (bad)
0x00000000130af8f3 ? (bad)
0x00000000130af8f4 ? or $0x41,%al
0x0000000013000900 ? pushq $0x0 0x0000000013000902 ? jmp 0x1300097e ->
0x000000001300097e ? addq $0x0,(%rsp) 0x0000000013000983 ? jmpq 0x13000340 -> 0x0000000013000988 ? add %al,(%rax) 0x000000001300098a ? add %al,(%rax) 0x000000001300098c ? add %al,(%rax)
0x0000000013000340 ? sub $0xf8,%rsp 0x0000000013000347 ? mov %r15,0x78(%rsp) 0x000000001300034c ? mov %r11,0x58(%rsp) 0x0000000013000351 ? mov %r10,0x50(%rsp) 0x0000000013000356 ? mov %r9,0x48(%rsp) 0x000000001300035b ? mov %r8,0x40(%rsp) 0x0000000013000360 ? mov %rdi,0x38(%rsp) 0x0000000013000365 ? mov %rsi,0x30(%rsp) 0x000000001300036a ? mov %rdx,0x10(%rsp) 0x000000001300036f ? mov %rcx,0x8(%rsp) 0x0000000013000374 ? mov %rax,(%rsp) 0x0000000013000378 ? mov 0xf8(%rsp),%rdi 0x0000000013000380 ? mov %rsp,%rsi 0x0000000013000383 ? movsd %xmm15,0xf8(%rsp) 0x000000001300038d ? movsd %xmm14,0xf0(%rsp) 0x0000000013000397 ? movsd %xmm13,0xe8(%rsp) 0x00000000130003a1 ? movsd %xmm12,0xe0(%rsp) 0x00000000130003ab ? movsd %xmm11,0xd8(%rsp) 0x00000000130003ab ? movsd %xmm11,0xd8(%rsp) 0x00000000130003b5 ? movsd %xmm10,0xd0(%rsp) 0x00000000130003bf ? movsd %xmm9,0xc8(%rsp) 0x00000000130003c9 ? movsd %xmm8,0xc0(%rsp) 0x00000000130003d3 ? movsd %xmm7,0xb8(%rsp) 0x00000000130003dc ? movsd %xmm6,0xb0(%rsp) 0x00000000130003e5 ? movsd %xmm5,0xa8(%rsp) 0x00000000130003ee ? movsd %xmm4,0xa0(%rsp) 0x00000000130003f7 ? movsd %xmm3,0x98(%rsp) 0x0000000013000400 ? movsd %xmm2,0x90(%rsp) 0x0000000013000409 ? movsd %xmm1,0x88(%rsp) 0x0000000013000412 ? movsd %xmm0,0x80(%rsp) 0x000000001300041b ? mov %r15,(%r14) 0x000000001300041e ? movabs $0x7f3d4ab05090,%rax // int ZEND_FASTCALL zend_jit_trace_exit(uint32_t exit_num, zend_jit_registers_buf regs) 0x0000000013000428 ? callq %rax 0x000000001300042a ? add $0x100,%rsp // loop again 0x0000000013000431 ? test %eax,%eax