scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.08k stars 1.24k forks source link

Reactor stalls 34 ms during perf-throughput mixed test #14642

Closed aleksbykov closed 1 year ago

aleksbykov commented 1 year ago

Version: 5.4.0~dev

Performance throughput test for master shows throughput degradation on 10% for last several runs. During last runs reactors stalls up to 34ms were registered on 2nd node.

Processed 2 stalls lasting a total of 77 milliseconds.
min=38 avg=38.5 median=38 p95=39 p99=None p999=None max=39

This graph is printed in bottom-up order, where callees are printed first.
Use --direction=top-down to print callers first.

[level#index/out_of pct%] below denotes:
  level  - nesting level in the graph
  index  - index of node among to its siblings
  out_of - number of siblings
  pct    - percentage of total stall time of this call relative to its siblings

Command line options: --address-threshold=0x100000000 --executable=/data/scylla/libexec/scylla --direction=bottom-up --branch-threshold=0.05 --file=/data/db-node-2

++[0#1/1 100%] addr=0x56a67c3 total=77 count=2 avg=38:
|              seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
|              (inlined by) seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar/./seastar/src/core/reactor.cc:825
|              (inlined by) seastar::print_with_backtrace at ./build/release/seastar/./seastar/src/core/reactor.cc:844
++           - addr=0x56a5b70:
|              seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar/./seastar/src/core/reactor.cc:1414
++           - addr=0x56a6f9f:
|              seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar/./seastar/src/core/reactor.cc:1151
|              (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar/./seastar/src/core/reactor.cc:1171
|              (inlined by) seastar::reactor::block_notifier at ./build/release/seastar/./seastar/src/core/reactor.cc:1451
++           - addr=0x3cb1f: ?? ??:0
  ++[1#1/2 51%] addr=0x1e7c546 total=39 count=1 avg=39:
  |             bplus::tree<long, intrusive_array<cache_entry>, dht::raw_token_less_comparator, 16ul, (bplus::key_search)0, (bplus::with_debug)0>::iterator::emplace_before<bplus::tree<long, intrusive_array<cache_entry>, dht::raw_token_less_comparator, 16ul, (bplus::key_search)0, (bplus::with_debug)0>::iterator::emplace_before<cache_entry::evictable_tag, seastar::lw_shared_ptr<schema const>&, dht::decorated_key, partition_entry>(long, dht::raw_token_less_comparator, cache_entry::evictable_tag&&, seastar::lw_shared_ptr<schema const>&, dht::decorated_key&&, partition_entry&&)::{lambda(bplus::data<long, intrusive_array<cache_entry>, dht::raw_token_less_comparator, 16ul, (bplus::key_search)0, (bplus::with_debug)0>*)#1}, cache_entry::evictable_tag, seastar::lw_shared_ptr<schema const>&, dht::decorated_key, partition_entry> at ././utils/bptree.hh:?
  |             (inlined by) bplus::tree<long, intrusive_array<cache_entry>, dht::raw_token_less_comparator, 16ul, (bplus::key_search)0, (bplus::with_debug)0>::iterator::emplace_before<cache_entry::evictable_tag, seastar::lw_shared_ptr<schema const>&, dht::decorated_key, partition_entry> at ././utils/bptree.hh:751
  |             (inlined by) double_decker<long, cache_entry, dht::raw_token_less_comparator, dht::ring_position_comparator, 16, (bplus::key_search)0, (bplus::with_debug)0>::emplace_before<cache_entry::evictable_tag, seastar::lw_shared_ptr<schema const>&, dht::decorated_key, partition_entry> at ././utils/double-decker.hh:210
  ++          - addr=0x1e7a2dd:
  |             row_cache::update(row_cache::external_updater, replica::memtable&)::$_21::operator() at ./row_cache.cc:1040
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}::operator() at ./row_cache.cc:980
  |             (inlined by) logalloc::allocating_section::with_reclaiming_disabled<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:499
  |             (inlined by) logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}>(logalloc::region&, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:521
  |             (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}>(logalloc::region&, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:470
  |             (inlined by) logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:520
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator() at ./row_cache.cc:975
  |             (inlined by) with_allocator<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}> at ././utils/allocation_strategy.hh:313
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator() at ./row_cache.cc:965
  |             (inlined by) std::__invoke_impl<void, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
  |             (inlined by) std::__invoke<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
  |             (inlined by) std::__apply_impl<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}, std::tuple<>> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1852
  |             (inlined by) std::apply<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}, std::tuple<> > at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1863
  |             (inlined by) seastar::futurize<void>::apply<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at ././seastar/include/seastar/core/future.hh:1970
  |             (inlined by) seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#1}::operator() at ././seastar/include/seastar/core/thread.hh:262
  |             (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#1}>::call at ././seastar/include/seastar/util/noncopyable_function.hh:129
  ++          - addr=0x5965286:
  |             seastar::noncopyable_function<void ()>::operator() at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:215
  |             (inlined by) seastar::thread_context::main at ./build/release/seastar/./seastar/src/core/thread.cc:311
  ++[1#2/2 49%] addr=0x15ae24 total=38 count=1 avg=38: ?? ??:0
  ++          - addr=0x2006f51:
  |             compare_unsigned<managed_bytes_basic_view<(mutable_view)0>, managed_bytes_basic_view<(mutable_view)0> > at ././utils/fragment_range.hh:280
  |             (inlined by) legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator() at ././compound_compat.hh:150
  |             (inlined by) partition_key_view::legacy_tri_compare at ./keys.cc:26
  ++          - addr=0x3ac4917:
  |             partition_key::legacy_tri_compare at ././keys.hh:747
  |             (inlined by) dht::ring_position_tri_compare at ./dht/i_partitioner.cc:324
  ++          - addr=0x1e79ad0:
  |             dht::ring_position_comparator::operator()<cache_entry, dht::decorated_key> at ././dht/i_partitioner.hh:574
  |             (inlined by) intrusive_array<cache_entry>::lower_bound<dht::decorated_key, dht::ring_position_comparator> at ././utils/intrusive-array.hh:261
  |             (inlined by) double_decker<long, cache_entry, dht::raw_token_less_comparator, dht::ring_position_comparator, 16, (bplus::key_search)0, (bplus::with_debug)0>::lower_bound<dht::decorated_key> at ././utils/double-decker.hh:273
  |             (inlined by) double_decker<long, cache_entry, dht::raw_token_less_comparator, dht::ring_position_comparator, 16, (bplus::key_search)0, (bplus::with_debug)0>::lower_bound<dht::decorated_key> at ././utils/double-decker.hh:291
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}::operator() at ./row_cache.cc:979
  |             (inlined by) logalloc::allocating_section::with_reclaiming_disabled<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:499
  |             (inlined by) logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}>(logalloc::region&, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:521
  |             (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}>(logalloc::region&, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:470
  |             (inlined by) logalloc::allocating_section::operator()<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:520
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}::operator() at ./row_cache.cc:975
  |             (inlined by) with_allocator<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator()()::{lambda()#3}> at ././utils/allocation_strategy.hh:313
  |             (inlined by) row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}::operator() at ./row_cache.cc:965
  |             (inlined by) std::__invoke_impl<void, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
  |             (inlined by) std::__invoke<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
  |             (inlined by) std::__apply_impl<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}, std::tuple<>> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1852
  |             (inlined by) std::apply<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}, std::tuple<> > at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1863
  |             (inlined by) seastar::futurize<void>::apply<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}> at ././seastar/include/seastar/core/future.hh:1970
  |             (inlined by) seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#1}::operator() at ././seastar/include/seastar/core/thread.hh:262
  |             (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&)::$_21>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&)::$_21)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#1}>::call at ././seastar/include/seastar/util/noncopyable_function.hh:129
  ++          - addr=0x5965286:
  |             seastar::noncopyable_function<void ()>::operator() at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:215
  |             (inlined by) seastar::thread_context::main at ./build/release/seastar/./seastar/src/core/thread.cc:311
  -> continued at addr=0x5965286 above
abykov@ububuntu ~/tmp/perf-throughput-mixed/mixed/master $ 

DB logs: https://cloudius-jenkins-test.s3.amazonaws.com/a4ef6573-9210-4176-974c-7fa13d4c0b73/20230709_052919/db-cluster-a4ef6573.tar.gz

Monitoring: http://44.204.125.133:3000/d/advanced-master/advanced?orgId=1&from=1688876106352&to=1688879512229

avikivity commented 1 year ago

There are just 2 stalls, and they look like flukes. The performance degradation is likely unrelated.

avikivity commented 1 year ago

Nothing in the metrics indicates a problem.

aleksbykov commented 1 year ago

@avikivity Should i close the issue?

avikivity commented 1 year ago

Is there a regression or not?

aleksbykov commented 1 year ago

The total max throghput is less by 11%, the only problem is reactor stall but you said it is flukes and metrics shows nothing. Here is results:

op_rate

Current test result Version compared to Best Diff best Commit, Date Last Diff last Commit, Date
89336.0 5.3.0~dev 100631.0 -11% #941407b905a62023-02-18 91607.0 -2% #3b424e391b6c2023-05-21
89336.0 5.4.0~dev 99022.0 -9% #b7022cd74e7a2023-06-13 78563.0 +13% #f6f974cdeb112023-06-29


aleksbykov commented 1 year ago

It looks like regression, but may it is fluky results related to aws

aleksbykov commented 1 year ago

@avikivity could you also take a look on that stalls, the happened for same build but during latency mixed test. decoded-reactor-stalls.tar.gz

avikivity commented 1 year ago

@avikivity could you also take a look on that stalls, the happened for same build but during latency mixed test. decoded-reactor-stalls.tar.gz

These look like #8828

aleksbykov commented 1 year ago

8828

Thank you