Open andrii29 opened 1 year ago
I think this will be pretty much impossible to debug and fix without having a reproducer or at least more info about where it goes wrong.
looks like #11609 - @iluuu1994 do you know offhand if it'd be relevant to php 8.1.20 as indicated here?
@maaarghk I don't think this is related, as this reverting the tracing counter handlers was only added in 8.2.
what tool should i use to get relevant trace?
some trace logs from stuck workers
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, NULL, 8) = 0
[pid 15366] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f95ff2bc970}, 8) = 0
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="5\1\0\0", iov_len=4}, {iov_base="\265\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="2\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`2\0\225\10\0\6\21?; \1)\10\301\360~\31A\220\302\36'uh\5\t\t"..., iov_len=284}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 309
[pid 12215] recvfrom(7, 0x5580d061efd0, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\312\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\272\2\210\17\265\2\0\0\334\7\0\0\335\7\0\0\320\2\0\0\3(\265/\375`\320\1=\r\0\366"..., 454, 0, NULL, NULL) = 454
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
[pid 12215] sendmsg(7, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="j\1\0\0", iov_len=4}, {iov_base="\267\2\0\0", iov_len=4}, {iov_base="\0\0\0\0", iov_len=4}, {iov_base="\334\7\0\0", iov_len=4}, {iov_base="\335\7\0\0", iov_len=4}, {iov_base="\207\1\0\0", iov_len=4}, {iov_base="\3", iov_len=1}, {iov_base="(\265/\375`\207\0=\n\0\26TIB \2156\10\263p\325!\337X\27z\201\332E\321\326\354"..., iov_len=337}], msg_iovlen=8, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 362
[pid 12215] recvfrom(7, 0x5580d06b8150, 4, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 12215] poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 299999) = 1 ([{fd=7, revents=POLLIN}])
[pid 12215] recvfrom(7, "\274\1\0\0", 4, 0, NULL, NULL) = 4
[pid 12215] recvfrom(7, "\273\2\210\17\267\2\0\0\334\7\0\0\335\7\0\0\23\2\0\0\3(\265/\375`\23\1\315\f\0\226"..., 440, 0, NULL, NULL) = 440
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
[pid 12215] getpid() = 141
Do you have a script that reliably reproduces the problem, or is it unrelated to a script? The syscall trace you show indicates it seems stuck in an infinite EAGAIN loop, but that doesn't include the source information...
we use symfony 5.4 + overblog/GraphQLBundle I enabled slow log for 15 seconds and got next results
[18-Jul-2023 16:32:08] NOTICE: child 37 stopped for tracing
[18-Jul-2023 16:32:08] NOTICE: about to trace 37
[18-Jul-2023 16:32:08] [pool www] pid 37
script_filename = /app/public/index.php
[0x00007f5f86a16460] getFirstPriority() /app/src/GraphQLBundle/CustomResolver.php:38
[0x00007f5f86a163a0] valueFromObjectOrArray() /app/src/GraphQLBundle/CustomResolver.php:19
[0x00007f5f86a162e0] __invoke() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:623
[0x00007f5f86a161f0] resolveFieldValueOrError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:550
[0x00007f5f86a160c0] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15fa0] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15ef0] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15e10] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15d30] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a15c50] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15b20] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[0x00007f5f86a15a00] executeFields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1145
[0x00007f5f86a15950] collectAndExecuteSubfields() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1106
[0x00007f5f86a15870] completeObjectValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:793
[0x00007f5f86a15790] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a156b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:887
[0x00007f5f86a15570] completeListValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:761
[0x00007f5f86a15490] completeValue() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:654
[0x00007f5f86a153b0] completeValueCatchingError() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:557
[0x00007f5f86a15280] resolveField() /app/vendor/webonyx/graphql-php/src/Executor/ReferenceExecutor.php:1195
[18-Jul-2023 16:32:08] NOTICE: finished trace of 37
logs from workers without slow log
Fatal error: Maximum execution time of 99+2 seconds exceeded (terminated) in /app/src/Document/AbstractDocument.php on line 33
[18-Jul-2023 03:40:25] WARNING: [pool www] child 42 exited with code 124 after 393.292804 seconds from start
in normal state requests are finished in less than second
same behaviour on php 8.2
same behaviour on php 8.1.19, symfony 6.3.4
Same, word for word "after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value."
we using k8s and after deploy ~1 in 20 pods use 100% cpu
@andrii29 may be you resolve issue?
here is our opcache config
opcache.validate_timestamps=0
opcache.interned_strings_buffer=50
opcache.memory_consumption=300
opcache.max_accelerated_files=32531
opcache.jit_buffer_size=128M
opcache.preload=/app/symfony-app/config/preload.php
opcache.preload_user=www-data
@smepti nope( we upgraded php to 8.3, but the issue is still present
Description
The following code:
Resulted in this output:
But I expected this output instead:
Hello, after enabling JIT, our application start perform better (~10% faster). But, some php-fpm pods (~1 in 20) after start use all available cpu and timeout the requests. After pod restart, everything back to normal. Without JIT everything is ok and cpu utilization is at normal value.
Regards
PHP Version
PHP 8.1.20
Operating System
Debian GNU/Linux 10 (buster) docker container on Amazon Linux 2