php / php-src

The PHP Interpreter
https://www.php.net
Other
37.93k stars 7.72k forks source link

JIT cause 100% cpu usage #11736

Open andrii29 opened 1 year ago

andrii29 commented 1 year ago

Description

The following code:

    [opcache]
    opcache.enable=1
    opcache.enable_cli=1
    opcache.memory_consumption=512
    opcache.jit_buffer_size=256M
    opcache.interned_strings_buffer=16
    opcache.max_accelerated_files=200000
    opcache.max_wasted_percentage=5
    opcache.use_cwd=1
    opcache.validate_timestamps=0
    opcache.revalidate_freq=600
    opcache.revalidate_path=0
    opcache.save_comments=1
    opcache.enable_file_override=0
    opcache.optimization_level=0x7FFFBFFF
    opcache.dups_fix=0
    opcache.max_file_size=0
    opcache.consistency_checks=0
    opcache.force_restart_timeout=180
    opcache.file_cache_only=0
    opcache.file_cache_consistency_checks=1
    opcache.huge_code_pages=1
    opcache.preload=/path/to/config/preload.php
    opcache.preload_user=user

Resulted in this output:

[18-Jul-2023 03:40:09] NOTICE: [pool www] child 179 started
Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /path/to/file.php on line 33
[18-Jul-2023 03:40:13] WARNING: [pool www] child 47 exited with code 124 after 381.082060 seconds from start
[18-Jul-2023 03:40:13] NOTICE: [pool www] child 182 started
Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /path/to/file.php on line 33
[18-Jul-2023 03:40:25] WARNING: [pool www] child 42 exited with code 124 after 393.292804 seconds from start

But I expected this output instead:

no errors in log

Hello, after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value.

Regards

opcache.blacklist_filename => no value => no value
opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => On => On
opcache.enable_file_override => Off => Off
opcache.error_log => no value => no value
opcache.file_cache => no value => no value
opcache.file_cache_consistency_checks => On => On
opcache.file_cache_only => Off => Off
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.huge_code_pages => On => On
opcache.interned_strings_buffer => 16 => 16
opcache.jit => tracing => tracing
opcache.jit_bisect_limit => 0 => 0
opcache.jit_blacklist_root_trace => 16 => 16
opcache.jit_blacklist_side_trace => 8 => 8
opcache.jit_buffer_size => 256M => 256M
opcache.jit_debug => 0 => 0
opcache.jit_hot_func => 127 => 127
opcache.jit_hot_loop => 64 => 64
opcache.jit_hot_return => 8 => 8
opcache.jit_hot_side_exit => 8 => 8
opcache.jit_max_exit_counters => 8192 => 8192
opcache.jit_max_loop_unrolls => 8 => 8
opcache.jit_max_polymorphic_calls => 2 => 2
opcache.jit_max_recursive_calls => 2 => 2
opcache.jit_max_recursive_returns => 2 => 2
opcache.jit_max_root_traces => 1024 => 1024
opcache.jit_max_side_traces => 128 => 128
opcache.jit_prof_threshold => 0.005 => 0.005
opcache.lockfile_path => /tmp => /tmp
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 200000 => 200000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 512 => 512
opcache.opt_debug_level => 0 => 0
opcache.optimization_level => 0x7FFFBFFF => 0x7FFFBFFF
opcache.preferred_memory_model => no value => no value
opcache.preload => /path/to/config/preload.php => //path/to/config/preload.php
opcache.preload_user => user => user
opcache.protect_memory => Off => Off
opcache.record_warnings => Off => Off
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 600 => 600
opcache.revalidate_path => Off => Off
opcache.save_comments => On => On
opcache.use_cwd => On => On
opcache.validate_permission => Off => Off
opcache.validate_root => Off => Off
opcache.validate_timestamps => Off => Off

PHP Version

PHP 8.1.20

Operating System

Debian GNU/Linux 10 (buster) docker container on Amazon Linux 2

nielsdos commented 1 year ago

I think this will be pretty much impossible to debug and fix without having a reproducer or at least more info about where it goes wrong.

maaarghk commented 1 year ago

looks like #11609 - @iluuu1994 do you know offhand if it'd be relevant to php 8.1.20 as indicated here?

iluuu1994 commented 1 year ago

@maaarghk I don't think this is related, as this reverting the tracing counter handlers was only added in 8.2.

andrii29 commented 1 year ago

what tool should i use to get relevant trace?

some trace logs from stuck workers

[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="5\1\0\0", iov_len=4}, {iov_base="\265\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="2\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`2\0\225\10\0\6\21?; \1)\10\301\360~\31A\220\302\36'uh\5\t\t"..., iov_len=284}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 309
[pid 12215] recvfrom(7, 0x5580d061efd0, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\312\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\272\2\210\17\265\2\0\0\334\7\0\0\335\7\0\0\320\2\0\0\3(\265/\375`\320\1=\r\0\366"..., 454, 0, NULL, NULL) = 454
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="j\1\0\0", iov_len=4}, {iov_base="\267\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="\207\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`\207\0=\n\0\26TIB \2156\10\263p\325!\337X\27z\201\332E\321\326\354"..., iov_len=337}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 362
[pid 12215] recvfrom(7, 0x5580d06b8150, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\274\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\273\2\210\17\267\2\0\0\334\7\0\0\335\7\0\0\23\2\0\0\3(\265/\375`\23\1\315\f\0\226"..., 440, 0, NULL, NULL) = 440
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
[pid 12215] getpid()                    = 141
nielsdos commented 1 year ago

Do you have a script that reliably reproduces the problem, or is it unrelated to a script? The syscall trace you show indicates it seems stuck in an infinite EAGAIN loop, but that doesn't include the source information...

andrii29 commented 1 year ago

we use symfony 5.4 + overblog/GraphQLBundle I enabled slow log for 15 seconds and got next results

[18-Jul-2023 16:32:08] NOTICE: child 37 stopped for tracing
[18-Jul-2023 16:32:08] NOTICE: about to trace 37
[18-Jul-2023 16:32:08]  [pool www] pid 37
script_filename = /app/public/index.php
[0x00007f5f86a16460] getFirstPriority() /app/src/GraphQLBundle/CustomResolver.php:38
[0x00007f5f86a163a0] valueFromObjectOrArray() /app/src/GraphQLBundle/CustomResolver.php:19
[0x00007f5f86a162e0] __invoke() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:623
[0x00007f5f86a161f0] resolveFieldValueOrError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:550
[0x00007f5f86a160c0] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15fa0] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15ef0] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15e10] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15d30] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a15c50] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15b20] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15a00] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15950] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15870] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15790] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a156b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:887
[0x00007f5f86a15570] completeListValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:761
[0x00007f5f86a15490] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a153b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15280] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[18-Jul-2023 16:32:08] NOTICE: finished trace of 37

logs from workers without slow log

Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /app/src/Document/AbstractDocument.php on line 33
[18-Jul-2023 03:40:25] WARNING: [pool www] child 42 exited with code 124 after 393.292804 seconds from start

in normal state requests are finished in less than second

andrii29 commented 11 months ago

same behaviour on php 8.2

smepti commented 2 months ago

same behaviour on php 8.1.19, symfony 6.3.4

Same, word for word "after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value."

we using k8s and after deploy ~1 in 20 pods use 100% cpu

Снимок экрана 2024-06-19 в 01 57 37

@andrii29 may be you resolve issue?

smepti commented 2 months ago

here is our opcache config

opcache.validate_timestamps=0
opcache.interned_strings_buffer=50
opcache.memory_consumption=300
opcache.max_accelerated_files=32531
opcache.jit_buffer_size=128M
opcache.preload=/app/symfony-app/config/preload.php
opcache.preload_user=www-data
andrii29 commented 2 months ago

@smepti nope( we upgraded php to 8.3, but the issue is still present