redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.62k stars 585 forks source link

Slow start-up time investigation (vm + non-local storage) #17188

Open dotnwat opened 7 months ago

dotnwat commented 7 months ago

Notes from investigating a 20-30 second start-up delay in a virtual machine environment using fast, but non-local storage.

Summary

Open question:

Start-up traces

DEBUG 2024-03-14 21:58:23,007 seastar - smp::count: 16
DEBUG 2024-03-14 21:58:23,007 seastar - latency_goal: 0.00075
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU0 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU1 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU2 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU3 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU4 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU5 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU6 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU7 to NUMA0
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU8 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU9 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU10 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU11 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU12 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU13 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU14 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Assign CPU15 to NUMA1
DEBUG 2024-03-14 21:58:23,013 seastar - Auto-configure 2 IO groups
INFO  2024-03-14 21:58:23,027 seastar - Reactor backend: linux-aio
DEBUG 2024-03-14 21:58:23,033 [shard  0:main] seastar - generate_config dev_id: 0
INFO  2024-03-14 21:58:23,033 [shard  0:main] seastar - Created fair group io-queue-0 for 8 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 1572864
INFO  2024-03-14 21:58:23,033 [shard  0:main] seastar - IO queue uses 0.75ms latency goal for device 0
INFO  2024-03-14 21:58:23,033 [shard  0:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
DEBUG 2024-03-14 21:58:23,033 [shard  0:main] seastar - allocate 0 IO group with 8 queues, dev 0
INFO  2024-03-14 21:58:23,033 [shard  0:main] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
DEBUG 2024-03-14 21:58:23,033 [shard  0:main] seastar - attached 0 queue to 0 IO group, dev 0

13 second delay, 15 in a second examined trace.

Some interesting observations here

DEBUG 2024-03-14 21:58:38,835 [shard 13:main] seastar - generate_config dev_id: 0
INFO  2024-03-14 21:58:38,835 [shard 13:main] seastar - Created fair group io-queue-0 for 8 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 1572864
INFO  2024-03-14 21:58:38,835 [shard 13:main] seastar - IO queue uses 0.75ms latency goal for device 0
INFO  2024-03-14 21:58:38,835 [shard 13:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
DEBUG 2024-03-14 21:58:38,835 [shard 13:main] seastar - allocate 1 IO group with 8 queues, dev 0
DEBUG 2024-03-14 21:58:38,835 [shard 13:main] seastar - attached 13 queue to 1 IO group, dev 0
DEBUG 2024-03-14 21:58:38,840 [shard  8:main] seastar - attached 8 queue to 1 IO group, dev 0
DEBUG 2024-03-14 21:58:38,842 [shard 12:main] seastar - attached 12 queue to 1 IO group, dev 0
DEBUG 2024-03-14 21:58:38,845 [shard  9:main] seastar - attached 9 queue to 1 IO group, dev 0
DEBUG 2024-03-14 21:58:38,848 [shard 11:main] seastar - attached 11 queue to 1 IO group, dev 0
... clipped ...
INFO  2024-03-14 21:58:38,876 [shard  0:main] main - application.cc:750 - audit_log_client.scram_username:  - Username to use for SCRAM authentication mechanisms
INFO  2024-03-14 21:58:38,878 [shard  0:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  3:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 10:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  4:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 15:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  5:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  2:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  7:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 14:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  1:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard  6:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 12:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 13:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,886 [shard  8:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,887 [shard  9:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,878 [shard 11:main] seastar - Enabling heap profiler - using 3000037 bytes sampling rate
INFO  2024-03-14 21:58:38,889 [shard  0:main] main - application.cc:496 - Setting abort_on_allocation_failure (abort on OOM): true
Reactor stalled for 33 ms on shard 9. Backtrace: 0x90460bf 0x904781d 0x42abf 0x857ea5a 0x84cd061 0x84cdd86 0x84cc913 0x2d08fd3 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7
Reactor stalled for 34 ms on shard 6. Backtrace: 0x90460bf 0x904781d 0x42abf 0x857ea5a 0x84cd061 0x84cdd86 0x84cc913 0x2d08fd3 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
Reactor stalled for 33 ms on shard 12. Backtrace: 0x90460bf 0x904781d 0x42abf 0x857ea5a 0x84cd061 0x84cdd86 0x84cc913 0x2d08fd3 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7
Reactor stalled for 36 ms on shard 11. Backtrace: 0x90460bf 0x904781d 0x42abf 0x857ea8f 0x84cd061 0x84cdd86 0x84cc913 0x2d08fd3 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7
INFO  2024-03-14 21:58:38,982 [shard  0:main] syschecks - Writing pid file "/var/lib/redpanda/data/pid.lock"
TRACE 2024-03-14 21:58:39,167 [shard  0:main] syschecks - NOTIFY_SOCKET unset. ignoring STATUS=checking environment (CPU, Mem)
INFO  2024-03-14 21:58:39,168 [shard  0:main] storage - directories.h:33 - Checking `/var/lib/redpanda/data` for supported filesystems
INFO  2024-03-14 21:58:39,168 [shard  0:main] syschecks - Detected file system type is xfs
INFO  2024-03-14 21:58:39,168 [shard  0:main] cloud_storage - cache_service.cc:1420 - Creating cache directory "/var/lib/redpanda/data/cloud_storage_cache"
Reactor stalled for 39 ms on shard 10. Backtrace: 0x90460bf 0x904781d 0x42abf 0x8f9db07 0x8fa1ccb 0x8f92fde 0x2ff25c1 0x2ff2dec 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
Reactor stalled for 45 ms on shard 8. Backtrace: 0x90460bf 0x904781d 0x42abf 0x8f9db07 0x8fa1ccb 0x8f92fde 0x2ff25c1 0x2ff2dec 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
Reactor stalled for 32 ms on shard 2. Backtrace: 0x90460bf 0x904781d 0x42abf 0x8f9db07 0x8fa1ccb 0x8f92fde 0x2ff25c1 0x2ff2dec 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf

11 second delay, 9 in a second examined trace.

Reactor stalled for 54 ms on shard 8. Backtrace: 0x90460bf 0x904781d 0x42abf 0x8f9db07 0x8fa1ccb 0x8f92fde 0x2ff25c1 0x2ff2dec 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7
TRACE 2024-03-14 21:58:49,173 [shard  0:main] syschecks - NOTIFY_SOCKET unset. ignoring STATUS=Constructing storage services
TRACE 2024-03-14 21:58:49,174 [shard  0:main] syschecks - NOTIFY_SOCKET unset. ignoring STATUS=Creating feature table
DEBUG 2024-03-14 21:58:49,175 [shard  0:main] dns_resolver - Query name ubuntu-vm-05 (ANY)
TRACE 2024-03-14 21:58:49,175 [shard  0:main] dns_resolver - Created udp socket 1
TRACE 2024-03-14 21:58:49,176 [shard  0:main] dns_resolver - Connect 1(2)->127.0.0.53:53

JIRA Link: CORE-1890

dotnwat commented 7 months ago

Reactor stall decoding

Sample 1

tl;dr ZSTD_initDCtx_internal probably touching a bunch of memory for some zstd scratch space and stalling out in page faults. kernel be faultin

{<kernel>} 0xffffffff9cb03977: handle_pte_fault+0x217
{<kernel>} 0xffffffff9cb05905: __handle_mm_fault+0x405
{<kernel>} 0xffffffff9cb05cc8: handle_mm_fault+0xd8
[nwatkins@fedora pstg]$ ~/src/seastar/scripts/seastar-addr2line -v -e amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda --kallsyms kallsysms/ubuntu-vm-05.pod-22.poc.local/proc/kallsyms
amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /opt/redpanda/lib/ld.so, for GNU/Linux 3.2.0, BuildID[sha1]=a373d6880ff95d17856b3a690f5a235aa77b9229, stripped

Reactor stalled for 33 ms on shard 9. Backtrace: 0x90460bf 0x904781d 0x42abf 0x857ea5a 0x84cd061 0x84cdd86 0x84cc913 0x2d08fd3 0x906a8ef 0x906e061 0x90c8a95 0x8ff76ff 0x91016 0x1166cf
kernel callstack: 0xffffffffffffff80 0xffffffff9c99194c 0xffffffff9c991bdd 0xffffffff9c8a3569 0xffffffff9c8a371e 0xffffffff9cb16530 0xffffffff9cafdcd9 0xffffffff9cb01dce 0xffffffff9cb03977 0xffffffff9cb05905 0xffffffff9cb05cc8 0xffffffff9c89f919 0xffffffff9d5bd877 0xffffffff9d600bb7

Reactor stalled for 33 ms on shard 9. Backtrace:
[Backtrace #0]
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x90460bf: void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:838
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:857
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x904781d: seastar::internal::cpu_stall_detector::generate_trace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1539
 (inlined by) seastar::internal::cpu_stall_detector::maybe_report() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1376
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1396
 (inlined by) seastar::reactor::block_notifier(int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1597
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x42abf: ?? ??:0
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x857ea5a: ZSTD_initDCtx_internal at /v/build/v_deps_build/zstd-prefix/src/zstd/lib/decompress/zstd_decompress.c:282
 (inlined by) ZSTD_initStaticDCtx at /v/build/v_deps_build/zstd-prefix/src/zstd/lib/decompress/zstd_decompress.c:285
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x84cd061: static_dctx at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/compression/async_stream_zstd.cc:120
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x84cdd86: async_stream_zstd at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/compression/async_stream_zstd.cc:150
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x84cc913: std::__1::__unique_if<compression::async_stream_zstd>::__unique_single std::__1::make_unique[abi:v160004]<compression::async_stream_zstd, unsigned long&, int const&>(unsigned long&, int const&) at /vectorized/llvm/bin/../include/c++/v1/__memory/unique_ptr.h:686
 (inlined by) compression::initialize_async_stream_zstd(unsigned long) at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/compression/async_stream_zstd.cc:60
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x2d08fd3: operator() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/redpanda/application.cc:526
 (inlined by) seastar::future<void> seastar::futurize<void>::invoke<application::initialize(std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<scheduling_groups>)::$_6&>(application::initialize(std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<scheduling_groups>)::$_6&) at /vectorized/include/seastar/core/future.hh:2003
 (inlined by) seastar::smp_message_queue::async_work_item<application::initialize(std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<YAML::Node>, std::__1::optional<scheduling_groups>)::$_6>::run_and_dispose() at /vectorized/include/seastar/core/smp.hh:263
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x906a8ef: seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2750
 (inlined by) seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3213
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x906e061: seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3397
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x90c8a95: operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4624
 (inlined by) decltype (((std::declval<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>)())()) std::__1::__invoke[abi:v160004]<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:487
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()[abi:v160004]() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
 (inlined by) std::__1::__function::__func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x8ff76ff: std::__1::__function::__value_func<void ()>::operator()[abi:v160004]() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:90
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x91016: ?? ??:0
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x1166cf: ?? ??:0

kernel callstack:
[Backtrace #1]
{<kernel>} 0xffffffffffffff80: 0xffffffffffffff80 (1061210960 bytes after last symbol)
{<kernel>} 0xffffffff9c99194c: smp_call_function_many_cond+0x13c
{<kernel>} 0xffffffff9c991bdd: on_each_cpu_cond_mask+0x1d
{<kernel>} 0xffffffff9c8a3569: native_flush_tlb_multi+0x59
{<kernel>} 0xffffffff9c8a371e: flush_tlb_mm_range+0xee
{<kernel>} 0xffffffff9cb16530: ptep_clear_flush+0x60
{<kernel>} 0xffffffff9cafdcd9: wp_page_copy+0x269
{<kernel>} 0xffffffff9cb01dce: do_wp_page+0xce
{<kernel>} 0xffffffff9cb03977: handle_pte_fault+0x217
{<kernel>} 0xffffffff9cb05905: __handle_mm_fault+0x405
{<kernel>} 0xffffffff9cb05cc8: handle_mm_fault+0xd8
{<kernel>} 0xffffffff9c89f919: do_user_addr_fault+0x1c9
{<kernel>} 0xffffffff9d5bd877: exc_page_fault+0x77
{<kernel>} 0xffffffff9d600bb7: asm_exc_page_fault+0x27

Sample 2

Looks like faulting in chunk cache allocations

{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x8f9db07: seastar::memory::cpu_pages::free_span_no_merge(unsigned int, unsigned int) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:709
 (inlined by) seastar::memory::cpu_pages::allocate_large_and_trim(unsigned int, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:816
 (inlined by) seastar::memory::cpu_pages::allocate_large(unsigned int, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:912
 (inlined by) seastar::memory::small_pool::add_more_objects() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1454
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x8fa1ccb: seastar::memory::small_pool::allocate() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1423
 (inlined by) void* seastar::memory::allocate_from_small_pool<(seastar::memory::alignment_t)0>(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1595
 (inlined by) seastar::memory::allocate_aligned(unsigned long, unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1687
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x8f92fde: seastar::internal::allocate_aligned_buffer_impl(unsigned long, unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:2254
 (inlined by) seastar::internal::allocate_aligned_buffer_impl(unsigned long, unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:142
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x2ff25c1: std::__1::unique_ptr<char [], seastar::free_deleter> seastar::allocate_aligned_buffer<char>(unsigned long, unsigned long) at /vectorized/include/seastar/core/aligned_buffer.hh:45
 (inlined by) segment_appender_chunk at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/storage/segment_appender_chunk.h:34
 (inlined by) lw_shared_ptr_no_esft<const unsigned long &, const detail::base_named_type<unsigned long, storage::alignment_type, std::__1::integral_constant<bool, true> > &> at /vectorized/include/seastar/core/shared_ptr.hh:178
 (inlined by) seastar::lw_shared_ptr<storage::segment_appender_chunk> seastar::lw_shared_ptr<storage::segment_appender_chunk>::make<unsigned long const&, detail::base_named_type<unsigned long, storage::alignment_type, std::__1::integral_constant<bool, true> > const&>(unsigned long const&, detail::base_named_type<unsigned long, storage::alignment_type, std::__1::integral_constant<bool, true> > const&) at /vectorized/include/seastar/core/shared_ptr.hh:282
 (inlined by) seastar::lw_shared_ptr<storage::segment_appender_chunk> seastar::make_lw_shared<storage::segment_appender_chunk, unsigned long const&, detail::base_named_type<unsigned long, storage::alignment_type, std::__1::integral_constant<bool, true> > const&>(unsigned long const&, detail::base_named_type<unsigned long, storage::alignment_type, std::__1::integral_constant<bool, true> > const&) at /vectorized/include/seastar/core/shared_ptr.hh:443
 (inlined by) operator() at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-059c3321c337b010b-1/redpanda/redpanda/src/v/storage/chunk_cache.h:51
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x2ff2dec: seastar::future<void> seastar::futurize<void>::invoke<storage::internal::chunk_cache::start()::{lambda(unsigned long)#1}&, unsigned long const&>(storage::internal::chunk_cache::start()::{lambda(unsigned long)#1}&, unsigned long const&) at /vectorized/include/seastar/core/future.hh:2003
 (inlined by) auto seastar::futurize_invoke<storage::internal::chunk_cache::start()::{lambda(unsigned long)#1}&, unsigned long const&>(storage::internal::chunk_cache::start()::{lambda(unsigned long)#1}&, unsigned long const&) at /vectorized/include/seastar/core/future.hh:2037
 (inlined by) seastar::internal::do_for_each_state<boost::iterators::counting_iterator<unsigned long, boost::use_default, boost::use_default>, storage::internal::chunk_cache::start()::{lambda(unsigned long)#1}>::run_and_dispose() at /vectorized/include/seastar/core/loop.hh:403
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x906a8ef: seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2750
 (inlined by) seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3213
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x906e061: seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3397
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x90c8a95: operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4624
 (inlined by) decltype (((std::declval<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>)())()) std::__1::__invoke[abi:v160004]<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:487
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()[abi:v160004]() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
 (inlined by) std::__1::__function::__func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x8ff76ff: std::__1::__function::__value_func<void ()>::operator()[abi:v160004]() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:90
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x91016: ?? ??:0
{amd64/23.3.7-1-amd64/opt/redpanda/libexec/redpanda} 0x1166cf: ?? ??:0

kernel callstack:
[Backtrace #1]
{<kernel>} 0xffffffffffffff80: 0xffffffffffffff80 (1061210960 bytes after last symbol)
{<kernel>} 0xffffffff9c99194c: smp_call_function_many_cond+0x13c
{<kernel>} 0xffffffff9c991bdd: on_each_cpu_cond_mask+0x1d
{<kernel>} 0xffffffff9c8a3569: native_flush_tlb_multi+0x59
{<kernel>} 0xffffffff9c8a371e: flush_tlb_mm_range+0xee
{<kernel>} 0xffffffff9cb16530: ptep_clear_flush+0x60
{<kernel>} 0xffffffff9cafdcd9: wp_page_copy+0x269
{<kernel>} 0xffffffff9cb01dce: do_wp_page+0xce
{<kernel>} 0xffffffff9cb03977: handle_pte_fault+0x217
{<kernel>} 0xffffffff9cb05905: __handle_mm_fault+0x405
{<kernel>} 0xffffffff9cb05cc8: handle_mm_fault+0xd8
{<kernel>} 0xffffffff9c89f919: do_user_addr_fault+0x1c9
{<kernel>} 0xffffffff9d5bd877: exc_page_fault+0x77
{<kernel>} 0xffffffff9d600bb7: asm_exc_page_fault+0x27

Other samples

Checked around 10 other random samples and they all seemed to be chunk cache allocations

travisdowns commented 7 months ago

Yes, I've seen something like this before. The chunk cache gets implicated because it's one of the main place we touch a substantial amount of memory at startup of a fresh cluster. We could just allocate the chunks in a more lazy fashion (but still allocate the memory) I suppose.

How do you know it's 200 faults? 50ms is extremely slow for a fault, these should often be around 1 to 10 microseconds maybe. Is it because there were 200 reactor stalls?

I think this occurs here either because something about the VM makes the machinery of handling allocation (e.g., page table manipulation way slower) or because we aren't getting huge pages. Normally we get 2MB huge pages which dramatically reduces the fault rate since we fault every 2MB not every 4K. There are counters for that should we access this system again.

dotnwat commented 7 months ago

How do you know it's 200 faults? 50ms is extremely slow for a fault, these should often be around 1 to 10 microseconds maybe. Is it because there were 200 reactor stalls?

Sorry, being sloppy in my writing. Right, not 200 faults. I meant, all the reactor stalls reported were around 50ms. My understanding is that not all reactor stalls are printed--that seastar throttles their printing--so that would be roughly 200 50ms reactor stalls. I don't think there is enough information there to know if that is what's going on.

One thing to note is that the initial chunk cache hydration happens in a seastar::do_for_each loop so I'm a little surprised that it would be causing the reactor stall since after each chunk allocation there is a yield opportunity for the reactor. Perhaps it lots of faults as a result of memory starting to fragment and ranges splitting? bit mysterious