redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.14k stars 558 forks source link

CI Failure (oversized allocation of 458752 in io_sink::submit) in `PartitionBalancerScaleTest.test_node_operations_at_scale` #18480

Open vbotbuildovich opened 1 month ago

vbotbuildovich commented 1 month ago

https://buildkite.com/redpanda/vtools/builds/13710

Module: rptest.scale_tests.partition_balancer_scale_test
Class: PartitionBalancerScaleTest
Method: test_node_operations_at_scale
Arguments: {
    "type": "many_partitions"
}
test_id:    PartitionBalancerScaleTest.test_node_operations_at_scale
status:     FAIL
run time:   942.224 seconds

<BadLogLines nodes=ip-172-31-15-130(4) example="WARN  2024-05-14 13:09:23,777 [shard 3:main] seastar_memory - oversized allocation: 458752 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x98db003 0x9526f0d 0x95337c8 0x972d0f8 0x9735f62 0x972a8fe 0x96a3af6 0x96172c8 0x96707b0 0x9585fcf /opt/redpanda/lib/libc.so.6+0x91016 /opt/redpanda/lib/libc.so.6+0x1166cf">
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 188, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1478, in raise_on_bad_logs
    lsearcher.search_logs(_searchable_nodes)
  File "/home/ubuntu/redpanda/tests/rptest/services/utils.py", line 197, in search_logs
    raise BadLogLines(bad_loglines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-15-130(4) example="WARN  2024-05-14 13:09:23,777 [shard 3:main] seastar_memory - oversized allocation: 458752 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x98db003 0x9526f0d 0x95337c8 0x972d0f8 0x9735f62 0x972a8fe 0x96a3af6 0x96172c8 0x96707b0 0x9585fcf /opt/redpanda/lib/libc.so.6+0x91016 /opt/redpanda/lib/libc.so.6+0x1166cf">

JIRA Link: CORE-2956

WillemKauf commented 1 month ago

Multiple oversized allocations in nightly build, all with same backtrace, decoded backtrace is:

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:68
 (inlined by) seastar::current_backtrace_tasklocal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:97
 (inlined by) seastar::current_tasktrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:148
 (inlined by) seastar::current_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:181
seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:849
 (inlined by) seastar::memory::cpu_pages::check_large_allocation(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:912
 (inlined by) seastar::memory::cpu_pages::allocate_large(unsigned int, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:919
 (inlined by) seastar::memory::allocate_large(unsigned long, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1542
 (inlined by) seastar::memory::allocate_slowpath(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1688
seastar::memory::allocate(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1707
 (inlined by) operator new(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:2429
void* std::__1::__libcpp_operator_new[abi:v160004]<unsigned long>(unsigned long) at /vectorized/llvm/bin/../include/c++/v1/new:266
 (inlined by) std::__1::__libcpp_allocate[abi:v160004](unsigned long, unsigned long) at /vectorized/llvm/bin/../include/c++/v1/new:292
 (inlined by) std::__1::allocator<seastar::internal::pending_io_request>::allocate[abi:v160004](unsigned long) at /vectorized/llvm/bin/../include/c++/v1/__memory/allocator.h:115
 (inlined by) seastar::circular_buffer<seastar::internal::pending_io_request, std::__1::allocator<seastar::internal::pending_io_request> >::expand(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/circular_buffer.hh:329
 (inlined by) seastar::circular_buffer<seastar::internal::pending_io_request, std::__1::allocator<seastar::internal::pending_io_request> >::expand() at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/circular_buffer.hh:323
 (inlined by) seastar::circular_buffer<seastar::internal::pending_io_request, std::__1::allocator<seastar::internal::pending_io_request> >::maybe_expand(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/circular_buffer.hh:359
 (inlined by) void seastar::circular_buffer<seastar::internal::pending_io_request, std::__1::allocator<seastar::internal::pending_io_request> >::emplace_back<seastar::internal::io_request, seastar::io_completion*&>(seastar::internal::io_request&&, seastar::io_completion*&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/core/circular_buffer.hh:419
 (inlined by) seastar::internal::io_sink::submit(seastar::io_completion*, seastar::internal::io_request) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:416
 (inlined by) seastar::io_queue::submit_request(seastar::io_desc_read_write*, seastar::internal::io_request) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:1048
 (inlined by) seastar::queued_io_request::dispatch() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:308
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:1039
 (inlined by) decltype (((std::declval<seastar::io_queue::poll_io_queue()::$_0&>)())((std::declval<seastar::fair_queue_entry&>)())) std::__1::__invoke[abi:v160004]<seastar::io_queue::poll_io_queue()::$_0&, seastar::fair_queue_entry&>(seastar::io_queue::poll_io_queue()::$_0&, seastar::fair_queue_entry&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::io_queue::poll_io_queue()::$_0&, seastar::fair_queue_entry&>(seastar::io_queue::poll_io_queue()::$_0&, seastar::fair_queue_entry&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:487
 (inlined by) std::__1::__function::__alloc_func<seastar::io_queue::poll_io_queue()::$_0, std::__1::allocator<seastar::io_queue::poll_io_queue()::$_0>, void (seastar::fair_queue_entry&)>::operator()[abi:v160004](seastar::fair_queue_entry&) at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
 (inlined by) std::__1::__function::__func<seastar::io_queue::poll_io_queue()::$_0, std::__1::allocator<seastar::io_queue::poll_io_queue()::$_0>, void (seastar::fair_queue_entry&)>::operator()(seastar::fair_queue_entry&) at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
std::__1::__function::__value_func<void (seastar::fair_queue_entry&)>::operator()[abi:v160004](seastar::fair_queue_entry&) const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
 (inlined by) std::__1::function<void (seastar::fair_queue_entry&)>::operator()(seastar::fair_queue_entry&) const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
 (inlined by) seastar::fair_queue::dispatch_requests(std::__1::function<void (seastar::fair_queue_entry&)>) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/fair_queue.cc:383
seastar::io_queue::poll_io_queue() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/io_queue.cc:1038
seastar::reactor::flush_pending_aio() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:1897
 (inlined by) seastar::reactor::io_queue_submission_pollfn::poll() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2963
seastar::reactor::poll_once() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3537
 (inlined by) operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3430
 (inlined by) seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:3458
operator() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:4696
 (inlined by) decltype (((std::declval<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>)())()) std::__1::__invoke[abi:v160004]<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:394
 (inlined by) void std::__1::__invoke_void_return_wrapper<void, true>::__call<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:487
 (inlined by) std::__1::__function::__alloc_func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()[abi:v160004]() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:185
 (inlined by) std::__1::__function::__func<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2, std::__1::allocator<seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_2>, void ()>::operator()() at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:356
std::__1::__function::__value_func<void ()>::operator()[abi:v160004]() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:510
 (inlined by) std::__1::function<void ()>::operator()() const at /vectorized/llvm/bin/../include/c++/v1/__functional/function.h:1156
 (inlined by) seastar::posix_thread::start_routine(void*) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/posix.cc:90
opt/redpanda/lib/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6e7b96dfb83f0bdcb6a410469b82f86415e5ada3, for GNU/Linux 3.2.0, stripped

pthread_condattr_setpshared at ??:?
__xmknodat at ??:?
WillemKauf commented 1 month ago

This is an internal seastar oversized allocation, in the io_sink::_pending_io object (which is a ss::circular_buffer<pending_io_request>).

travisdowns commented 1 week ago

seastar::internal::pending_io_request is 56 bytes, so this means that the iosink queue of resizing to 458752 can hold 8192 of them, i.e., there are > 4096 ios in flight in the iosink (somewhere between 4096 and 8192 because I assume it resizes via doubling).

This probably indicates that there are no either no io-properties deployed for this test, or the io is small enough that this many ios are still allowed into the sink. Maybe this could be a chunked_fifo instead of a circular_buffer.

StephanDollberg commented 1 week ago

@travisdowns did you see my comment on slack the other day about this: https://redpandadata.slack.com/archives/C05FCNFRG1E/p1718702628858319?thread_ts=1718686896.486769&cid=C05FCNFRG1E

travisdowns commented 1 week ago

@StephanDollberg I didn't but nice to see our analyses line up. I'll take a shot at swapping in chunked_fifo.