redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.15k stars 558 forks source link

Reactor stalls in large-batch 22.3.4 #7892

Closed travisdowns closed 3 months ago

travisdowns commented 1 year ago

This issue was split out of https://github.com/redpanda-data/redpanda/issues/7853 to track the reactor stall specific part of the issue.

Version & Environment

Redpanda version: (use rpk version): v22.3.4 (rev 5be3e8e) Cluster Info: 10 node bare-metal Host:

Operating System (e.g. from /etc/os-release):

NAME="Rocky Linux"
VERSION="9.0 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.0 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"

What went wrong?

After upgrading from 22.2.4 to 22.3.4 I am seeing lots of reactor stalls and timeouts (1+ stall per second per node). This is with one producer sending ~200k records/s (~50mb/s) of data (Custom java producer using official kafka producer libs).

What should have happened instead?

Stalls and timeouts shouldn't happen... This is concerning and I don't want to use this cluster with all the errors.

How to reproduce the issue?

  1. redpanda 22.3.4 using tls everywhere and mtls_identity 10 node cluster. Rhel 9. Single producer 200k rec/s (50mb/s)

Additional information

Here is a log snippet:

Dec 20 01:50:19 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527631 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527631 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard 111] rpc - transport.cc:172 - Request timeout to {host: NODE3, port: 33145}, correlation id: 11527631 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard  51] rpc - transport.cc:172 - Request timeout to {host: NODE9, port: 33145}, correlation id: 11527632 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527632 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527632 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527632 (1 in flight)
Dec 20 01:50:19 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527632 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527633 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527633 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527634 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  72] rpc - transport.cc:172 - Request timeout to {host: NODE6, port: 33145}, correlation id: 11527634 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527634 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527635 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  72] rpc - transport.cc:172 - Request timeout to {host: NODE6, port: 33145}, correlation id: 11527635 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527635 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527636 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527636 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527637 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  72] rpc - transport.cc:172 - Request timeout to {host: NODE6, port: 33145}, correlation id: 11527637 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527637 (1 in flight)
Dec 20 01:50:20 NODE0 redpanda[716268]: [shard  36] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 3446 (15 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527638 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527638 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  72] rpc - transport.cc:172 - Request timeout to {host: NODE6, port: 33145}, correlation id: 11527638 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527638 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  51] rpc - transport.cc:172 - Request timeout to {host: NODE9, port: 33145}, correlation id: 11527639 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527639 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527639 (1 in flight)
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527639 (1 in flight)
Dec 20 01:50:21 NODE0 rpk[716268]: Reactor stalled for 108 ms on shard 18. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: Reactor stalled for 295 ms on shard 37. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x11780c 0x5141a24 0x50de50f 0x5181e3c 0x50df211 0x52c9783 0x4e98a21 0x4e9225d 0x4e8fbe2 0x471f1d7 0x4722e6d 0x471c934 0x218ee04 0x21761dd 0x2175647 0x21770c1 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669af47 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9691c557 0xffffffff9692a2f0 0xffffffff96835080 0xffffffff968350eb 0xffffffff9690dc83 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9691c557 0xffffffff9692a2f0 0xffffffff96835080 0xffffffff968350eb 0xffffffff9690dc83 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9691c557 0xffffffff9692a2f0 0xffffffff96835080 0xffffffff968350eb 0xffffffff9690dc83 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527639 (1 in flight)
Dec 20 01:50:21 NODE0 rpk[716268]: Reactor stalled for 105 ms on shard 18. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: Reactor stalled for 294 ms on shard 37. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x11780c 0x5141a24 0x50de50f 0x5181e3c 0x50df211 0x52c9783 0x4e98a21 0x4e9225d 0x4e8fbe2 0x471f1d7 0x4722e6d 0x471c934 0x218ee04 0x21761dd 0x2175647 0x21770c1 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669af47 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a12eb6 0xffffffffc0a136a6 0xffffffff96857337 0xffffffff968be8ae 0xffffffff968bebd5 0xffffffff96854f60 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a12eb6 0xffffffffc0a136a6 0xffffffff96857337 0xffffffff968be8ae 0xffffffff968bebd5 0xffffffff96854f60 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9691c557 0xffffffff9692a2f0 0xffffffff96835080 0xffffffff968350eb 0xffffffff9690dc83 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: Reactor stalled for 186 ms on shard 99. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x50572c8 0x505895a 0x505ea8c 0x506c6e5 0x52c8fe9 0x5140804 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a12eb6 0xffffffffc0a136a6 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff9692a2f0 0xffffffff9692c358 0xffffffff968fa1eb 0xffffffff968fa3d5 0xffffffff968fa57c 0xffffffff968fa723 0xffffffff968fab75 0xffffffff96858ee8 0xffffffff968590d6 0xffffffff96859d3f 0xffffffffc0a18631 0xffffffff9685a36a 0xffffffff9685a83f 0xffffffff96e000ca 0xffffffff960f3f8d 0xffffffff96a7cca8 0xffffffff96c00c1e
Dec 20 01:50:21 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a12eb6 0xffffffffc0a136a6 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff9692a2f0 0xffffffff9692c358 0xffffffff968fa1eb 0xffffffff968fa3d5 0xffffffff968fa57c 0xffffffff968fa723 0xffffffff968fab75 0xffffffff96858ee8 0xffffffff968590d6 0xffffffff96859353 0xffffffff96859726 0xffffffff96859911 0xffffffffc0a16720 0xffffffffc0a17ddf 0xffffffffc0a185b0 0xffffffff9685a36a 0xffffffff9685a83f 0xffffffff96e000ca 0xffffffff960f3f8d 0xffffffff96a7cce3 0xffffffff96c00c1e 0xffffffff961851fb 0xffffffff96185379 0xffffffff96a7f385 0xffffffff96c00c1e
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  31] rpc - transport.cc:172 - Request timeout to {host: NODE2, port: 33145}, correlation id: 11527640 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527640 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527640 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527640 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527641 (1 in flight)
Dec 20 01:50:22 NODE0 rpk[716268]: Reactor stalled for 88 ms on shard 0. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x510c309 0x502cea1 0x502afbf 0x1a699be 0x5420d39 0x2d58f 0x2d648 0x1a647a4
Dec 20 01:50:22 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8d0c4 0xffffffff963ce574 0xffffffff96373a6c 0xffffffff96373f0f 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:22 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156034 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:22 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527641 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527641 (1 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1845 (15 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1846 (14 in flight)
Dec 20 01:50:22 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1847 (13 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527642 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  31] rpc - transport.cc:172 - Request timeout to {host: NODE2, port: 33145}, correlation id: 11527642 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527642 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 111] rpc - transport.cc:172 - Request timeout to {host: NODE3, port: 33145}, correlation id: 11527642 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527642 (1 in flight)
Dec 20 01:50:23 NODE0 rpk[716268]: Reactor stalled for 88 ms on shard 18. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156034 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1848 (13 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 rpk[716268]: Reactor stalled for 106 ms on shard 103. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a12eb6 0xffffffffc0a136a6 0xffffffff96857337 0xffffffff968be8ae 0xffffffff968bebd5 0xffffffff96854f60 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1849 (12 in flight)
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  72] rpc - transport.cc:172 - Request timeout to {host: NODE6, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  51] rpc - transport.cc:172 - Request timeout to {host: NODE9, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 rpk[716268]: Reactor stalled for 120 ms on shard 37. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x11780c 0x5141a24 0x50de50f 0x5181e3c 0x50df211 0x52c9783 0x4e98a21 0x4e9225d 0x4e8fbe2 0x471f1d7 0x4722e6d 0x471c934 0x218ee04 0x21761dd 0x2175647 0x21770c1 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669af47 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1850 (13 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  31] rpc - transport.cc:172 - Request timeout to {host: NODE2, port: 33145}, correlation id: 11527643 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527644 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  31] rpc - transport.cc:172 - Request timeout to {host: NODE2, port: 33145}, correlation id: 11527644 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527644 (1 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1851 (15 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1852 (14 in flight)
Dec 20 01:50:23 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527645 (1 in flight)
Dec 20 01:50:24 NODE0 rpk[716268]: Reactor stalled for 165 ms on shard 103. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96156032 0xffffffff96a8d065 0xffffffff96834f8e 0xffffffff9690f638 0xffffffff96947f3c 0xffffffff9682c9d2 0xffffffff9637112a 0xffffffff963737b3 0xffffffff96373e37 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: Reactor stalled for 165 ms on shard 37. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x11780c 0x5141a24 0x50de50f 0x5181e3c 0x50df211 0x52c9783 0x4e98a21 0x4e9225d 0x4e8fbe2 0x471f1d7 0x4722e6d 0x471c934 0x218ee04 0x21761dd 0x2175647 0x21770c1 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a124fd 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 rpk[716268]: kernel callstack: 0xffffffffffffff80 0xffffffff96a8cc2c 0xffffffff9669b068 0xffffffff9669b158 0xffffffff9669b852 0xffffffff96698aa3 0xffffffff9669951d 0xffffffff966997f5 0xffffffffc0a1261a 0xffffffffc0a127b4 0xffffffffc0a137bd 0xffffffff96857337 0xffffffff968be8ae 0xffffffff96854f4a 0xffffffff96857876 0xffffffffc1cc7ed0 0xffffffff96857337 0xffffffff968579a6 0xffffffff968fde49 0xffffffff96900172 0xffffffff96921597 0xffffffff96922bca 0xffffffff96923142 0xffffffff9690d8a9 0xffffffff9690dc78 0xffffffff9682cc37 0xffffffff9682cf9c 0xffffffff9682f9e1 0xffffffff9682fa99 0xffffffff96a7bb1b 0xffffffff96c0007c
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  51] rpc - transport.cc:172 - Request timeout to {host: NODE9, port: 33145}, correlation id: 11527645 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  31] rpc - transport.cc:172 - Request timeout to {host: NODE2, port: 33145}, correlation id: 11527645 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527645 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1853 (16 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527646 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard 107] rpc - transport.cc:172 - Request timeout to {host: NODE7, port: 33145}, correlation id: 11527646 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard 103] rpc - transport.cc:172 - Request timeout to {host: NODE4, port: 33145}, correlation id: 11527646 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  76] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 1854 (16 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  20] rpc - transport.cc:172 - Request timeout to {host: NODE8, port: 33145}, correlation id: 11527647 (1 in flight)
Dec 20 01:50:24 NODE0 redpanda[716268]: [shard  37] rpc - transport.cc:172 - Request timeout to {host: NODE5, port: 33145}, correlation id: 11527647 (1 in flight)
Dec 20 01:50:24 NODE0 rpk[716268]: Rate-limit: suppressed 17 backtraces on shard 74

JIRA Link: CORE-1117

travisdowns commented 1 year ago

There are 11 stalls in this snippet.

6 of them have the following userspace backtrace:

 (inlined by) seastar::reactor::block_notifier(int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1329
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x42abf: ?? at ??:0
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x105eab: ?? at ??:0
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x514081a: seastar::file_desc::read(void*, unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/posix.hh:224
 (inlined by) seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:289
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x5140bc4: decltype(static_cast<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(fp)()) std::__1::__invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3640
 (inlined by) std::__1::invoke_result<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>::type std::__1::invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:93
 (inlined by) auto seastar::internal::future_invoke<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::internal::monostate>(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::internal::monostate&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1223
 (inlined by) seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const::'lambda'()::operator()() const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1596
 (inlined by) void seastar::futurize<seastar::future<seastar::temporary_buffer<char> > >::satisfy_with_result_of<seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const::'lambda'()>(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:2134
 (inlined by) seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&)::operator()(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&) const at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:1589
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > seastar::future<void>::then_impl_nrvo<seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34, seastar::future<seastar::temporary_buffer<char> > >(seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::temporary_buffer<char> >&&, seastar::reactor::do_read_some(seastar::pollable_fd_state&, seastar::internal::buffer_allocator*)::$_34&, seastar::future_state<seastar::internal::monostate>&&), void>::run_and_dispose() at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/future.hh:781
{22.3.4-1/opt/redpanda/libexec/redpanda} 0x510b25f: seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2330

In userspace this is is in the do_read_some call, but ultimately the stack is in the kernel inside read(2). We have already confirmed the FD is readable, so this should be doing only on-CPU work, not blocking. Decoding the kernel trace would be important here, but this can only be done on the machine which produced it (due to dependence on the exact kernel image and also KASLR). @brianzer0 is that something you could help with?

This is not the first time I've seen stalls in do_read_some.

brianzer0 commented 1 year ago

I seem to be having issues decoding traces, can someone help instruct me on this?

Also here is a more recent trace:

Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: Reactor stalled for 52 ms on shard 81. Backtrace: 0x50eceaf 0x50ecbc2 0x42abf 0x105eab 0x514081a 0x5140bc4 0x510b25f 0x510ef37 0x5152275 0x50acb4f 0x91016 0x1166cf
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098eeb6 0xffffffffc098f6a6 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d1c7cb 0xffffffff82d2a2f0 0xffffffff82c35080 0xffffffff82c350eb 0xffffffff82d0dc83 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e61a 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098e4fd 0xffffffffc098e7b4 0xffffffffc098f7bd 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d22bca 0xffffffff82d23142 0xffffffff82d0d8a9 0xffffffff82d0dc78 0xffffffff82c2cc37 0xffffffff82c2cf9c 0xffffffff82c2f9e1 0xffffffff82c2fa99 0xffffffff82e7bb1b 0xffffffff8300007c
Jan  5 21:46:31 cdn-maple-as-132 rpk[850584]: kernel callstack: 0xffffffffffffff80 0xffffffff82e8cc2c 0xffffffff82a9b068 0xffffffff82a9b158 0xffffffff82a9b852 0xffffffff82a98aa3 0xffffffff82a9951d 0xffffffff82a997f5 0xffffffffc098eeb6 0xffffffffc098f6a6 0xffffffff82c57337 0xffffffff82cbe8ae 0xffffffff82c54f4a 0xffffffff82c57876 0xffffffffc19d3ed0 0xffffffff82c57337 0xffffffff82c579a6 0xffffffff82cfde49 0xffffffff82d00172 0xffffffff82d21597 0xffffffff82d2a2f0 0xffffffff82c35080 0xffffffff82c350eb 0xffffffff82d0f65d 0xffffffff82d47f3c 0xffffffff82c2c9d2 0xffffffff8277112a 0xffffffff827737b3 0xffffffff82773e37 0xffffffff82e7bb1b 0xffffffff8300007c
dotnwat commented 3 months ago

22.x is EOL