scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.57k stars 1.29k forks source link

Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

Closed tgrabiec closed 1 year ago

tgrabiec commented 7 years ago

After schema change, entries in memtables and cache are still using the old schema. They're upgraded lazily (but atomically) on access. Schema is currently tracked per-partition. This may badly impact latency if partitions are large.

avikivity commented 4 years ago

@tgrabiec can you sketch a solution here? since @xemul is doing some cache work, we may trick him into fixing it

slivne commented 4 years ago

@tgrabiec ping

avikivity commented 4 years ago

Adding keywords for issue search: upgrade upgrade_entry

avikivity commented 4 years ago

6425 outlines a simpler fix that has narrower scope (still this issue shoud be fixed)

tgrabiec commented 4 years ago

Currently, each partition entry has a chain of incremental versions (mutations). New versions are created when we try to add mutation on top of a snapshot. Snapshots are pointers to versions. When a snapshot is released, we try to merge versions on the spot. If merging is preempted, it will move to background. See mutation_cleaner::merge_and_destroy(partition_snapshot&). Version merging happens in partition_snapshot::merge_partition_versions().

We could reuse this mechanism for upgrading schema. We'd maintain an invariant that each partition version conforms to a single schema version. We will store schema_ptr per partition version rather than per partition entry. Version merging would check schemas and change behavior when they don't match. If schemas match we use the old behavior of merging newer to older. When schemas are different, we assume that the newer version has a newer schema and merge older to newer, upgrading the schema on-the-fly.

Snapshot readers (partition_snapshot_row_cursor) would have to be adjusted to take into account that versions may have different schemas and upgrade to snapshot's schema on-the-fly. Applying a mutation to the data store (memtable or cache) would work like this. When the incoming mutation has the same schema as the current version, we apply the same way as today. If schemas are different, we create a snapshot and put the incoming mutation into a new version. Then we drop the snapshot. This will trigger version merging. It will  complete on-the-spot for small partitions or will defer to mutation_cleaner.

When a read sees that the latest version is not at the data source's current schema, it should force the upgrade by creating an empty version conforming to the data source's current schema and let the same version merging mechanism do the upgrade. This way readers will not keep upgrading on-the-fly indefinitely.

tgrabiec commented 4 years ago

The above solution means that we will pay the cost of moving row entries between row trees during schema upgrade. Maybe that's acceptable. If we wanted to avoid that, we could improve the solution to allow a single partition version to be divided into two regions corresponding to the old and the new schema. We'd have:

class partition_version {
   position_in_partition _schema_pos;
   schema_ptr _schema; // Applies for entries with position() < _schema_pos or when !_old_schema
   schema_ptr _old_schema; // Applies for entries with position() >= _schema_pos
};

partition_snapshot_row_cursor would have to be told to recognize this.

Then version merging would first upgrade the old version to the new version, and then merge the two versions like it does today.

kostja commented 4 years ago

Why not simply trigger a dump after a schema change?

avikivity commented 4 years ago

A dump of what?

kostja commented 4 years ago

Of the memtable with old schema_ptr.

avikivity commented 4 years ago

It's not enough, we have the cache too. And if we do the cache correctly the memtable update is the same (they share data structures).

kostja commented 4 years ago

It's hard to imagine a cache entry update is latency sensitive. After all, it can not be slower than reloading the same entry from disk.

avikivity commented 4 years ago

The problem is that partitions are updated atomically. Imagine a 1GB partition in cache, 1M entries X 1k. If each entry takes a microsecond, then the partition will take 1s to update.

Of course, partitions can be larger than a gigabyte, entries can be smaller than 1k, and it can take more than 1us to update them.

avikivity commented 4 years ago

@tgrabiec I think splitting a partition_version into two during upgrade (if I understood your proposal correctly) is good, it reuses the existing merge support and doesn't add more responsibility to partition_version.

kostja commented 4 years ago

Just for the record, the same strategy can be used for large partition - expel from the cache after a schema change.

avikivity commented 4 years ago

Expelling a large partition from a cache also takes a large amount of time (and then we lose performance due to cache misses for a while).

avikivity commented 4 years ago

Here's an alternative solution. I think it has problems around maintaining continuity so I'm not advocating it, just noting it for reference.

  1. move cache_entry::_schema to row_cache (making the problem much worse - instead of a partition having to upgrade atomically, the entire cache has to be upgraded atomically)
  2. instantiate one row_cache per schema version. We can call this schema_version_row_cache, and row_cache will be a container of schema_version_row_cache responsible for managing it.
  3. Newer schema_version_row_cache instances will point to the next older schema_version_row_cache as its data source.
  4. Older schema_version_row_cache instances remove on read (since the new ones will populate on read)
  5. Also have a background worker to move (or maybe rely on aging out)
  6. When an older schema_version_row_cache becomes empty, row_cache drops it.

This has the advantage of saving a schema_ptr per partition, which can be important (several percent) for small partitions. But it is quite complicated.

slivne commented 2 years ago

a different option is when we need to upgrade a large partition - lets drop it - it will be "cheaper" then to upgrade all of it - maybe this can be a intermediate solution till we fixed the update in portions

slivne commented 2 years ago

this was suggested in the above - yet maybe we should do that on need basis (when we attempt to access the partition) not based on deciding to drop all the large partitions on a schema change.

avikivity commented 2 years ago

I thought a little about it, and I like reusing partition versions for it. It offloads some of the complexity to an existing mechanism. Still, it is quite complicated.

michoecho commented 2 years ago

@tgrabiec I'm working on this right now and I'm taking the partition_version reuse route. This may incur some conflicts with your mutation_partition_v2. Do you have any estimate about when it's coming to master?

tgrabiec commented 2 years ago

I think we should wait with this until mutation_partition_v2 is done. It's "almost done", should be soon.

michoecho commented 2 years ago

Progress update: I have a working[^1] patch at https://github.com/michoecho/scylla/commits/gentle_schema_upgrade, but it's not prepared for review yet. It also has multiple local improvement opportunities (e.g. naming, using a move instead of a copy, etc.), and it needs more tests, but it's generally shaped up.

I'll probably wait with finalizing it until mutation_partition_v2 is done, as @tgrabiec said, because some conflicts are bound to occur, e.g. in the version merging algorithm.

[^1]: Doing the desired thing and passing existing tests; I don't claim it to be correct yet.

mykaul commented 1 year ago

Progress update: I have a working1 patch at https://github.com/michoecho/scylla/commits/gentle_schema_upgrade, but it's not prepared for review yet. It also has multiple local improvement opportunities (e.g. naming, using a move instead of a copy, etc.), and it needs more tests, but it's generally shaped up.

I'll probably wait with finalizing it until mutation_partition_v2 is done, as @tgrabiec said, because some conflicts are bound to occur, e.g. in the version merging algorithm.

Footnotes

  1. Doing the desired thing and passing existing tests; I don't claim it to be correct yet. leftwards_arrow_with_hook

Somewhat optimistically setting it to 5.3 (assuming mutation partition will get in to either 5.2 or 5.3).

avikivity commented 1 year ago

Party!!!!!!!eleven

michoecho commented 1 year ago

I just wanted to point out that the solution presented by https://github.com/scylladb/scylladb/pull/13761 relies on the assumption that reads with on-the-fly upgrades aren't that much more expensive than regular reads.

Until the incremental upgrade is done, reads will carry the additional cost of on-the-fly upgrades. If it turns out that this cost is too high, the cluster can become overloaded after the schema change, and the cure will become worse than the disease.

AFAIK the implementation of upgrades isn't very efficient. I strongly think we should do some performance testing of schema changes under load to rule out the possibility that after the change upgrades are indeed low-latency, but have an unacceptable throughput cost in exchange.

We should do that before we branch the next release.

juliayakovlev commented 9 months ago

Reproduced with 2023.1.5

2024-02-14 07:48:43.459 <2024-02-14 07:48:14.446>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=dcd8f0b9-8709-4141-9fa6-4a097b295e94: type=REACTOR_STALLED regex=Reactor stalled line_number=53017 node=longevity-large-partitions-200k-pks-db-node-682943fe-0-4
2024-02-14T07:48:14.446+00:00 longevity-large-partitions-200k-pks-db-node-682943fe-0-4     !INFO | scylla[9927]: Reactor stalled for 878 ms on shard 4. Backtrace: 0x54674f3 0x5466950 0x5467cc0 0x3cb5f 0x7fd00c0ecf9a 0x1bdb768 0x1e65e48 0x1e32288 0x1e46a4a 0x1e46bc6 0x1e46bc6 0x1e46bc6 0x1e46bc6 0x1de6e82 0x1ca30ae 0x1d30ec5 0x1d28801 0x1d30a86 0x5477dd4 0x5479057 0x5499ec1 0x544a33a 0x8b19c 0x10cc5f
?? ??:0
?? ??:0

void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
?? ??:0
managed_bytes at ././utils/managed_bytes.hh:223
 (inlined by) atomic_cell_or_collection::copy(abstract_type const&) const at ./atomic_cell.cc:128
operator() at ./mutation_partition.cc:1584
 (inlined by) std::__exception_ptr::exception_ptr compact_radix_tree::tree<cell_and_hash, unsigned int>::copy_slots<compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const::{lambda(unsigned int)#1}, row::row(schema const&, column_kind, row const&)::$_14&, compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > >(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, cell_and_hash const*, unsigned int, unsigned int, compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const::{lambda(unsigned int)#1}&&, row::row(schema const&, column_kind, row const&)::$_14&) at ././utils/compact-radix-tree.hh:1397
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:1284
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >(compact_radix_tree::variadic_union<compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:820
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:828
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::clone<row::row(schema const&, column_kind, row const&)::$_14&>(row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:486
void compact_radix_tree::tree<cell_and_hash, unsigned int>::clone_from<row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int> const&, row::row(schema const&, column_kind, row const&)::$_14&) at ././utils/compact-radix-tree.hh:1853
 (inlined by) row at ./mutation_partition.cc:1587
 (inlined by) deletable_row at ././mutation_partition.hh:822
rows_entry at ././mutation_partition.hh:946
 (inlined by) rows_entry* allocation_strategy::construct<rows_entry, schema const&, rows_entry const&>(schema const&, rows_entry const&) at ././utils/allocation_strategy.hh:155
 (inlined by) operator() at ./mutation_partition.cc:152
 (inlined by) intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2084
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
void intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone_from<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}>(intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0> const&, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&&) at ././utils/intrusive_btree.hh:638
 (inlined by) mutation_partition at ./mutation_partition.cc:154
partition_entry::squashed(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>) at ./partition_version.cc:501
 (inlined by) partition_entry::upgrade(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>, mutation_cleaner&, cache_tracker*) at ./partition_version.cc:517
operator() at ./row_cache.cc:1324
 (inlined by) decltype(auto) with_allocator<row_cache::upgrade_entry(cache_entry&)::$_28>(allocation_strategy&, row_cache::upgrade_entry(cache_entry&)::$_28&&) at ././utils/allocation_strategy.hh:313
 (inlined by) row_cache::upgrade_entry(cache_entry&) at ./row_cache.cc:1323
 (inlined by) scanning_and_populating_reader::read_from_entry(cache_entry&) at ./row_cache.cc:569
 (inlined by) operator() at ./row_cache.cc:597
decltype(auto) logalloc::allocating_section::with_reclaiming_disabled<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&) at ././utils/logalloc.hh:499
 (inlined by) operator() at ././utils/logalloc.hh:521
 (inlined by) decltype(auto) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}>(logalloc::region, logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:470
 (inlined by) decltype(auto) logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&) at ././utils/logalloc.hh:520
 (inlined by) scanning_and_populating_reader::do_read_from_primary() at ./row_cache.cc:580
 (inlined by) scanning_and_populating_reader::read_from_primary() at ./row_cache.cc:626
operator() at ./row_cache.cc:641
 (inlined by) seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > std::__invoke_impl<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::__invoke_other, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::__invoke_result<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >::type std::__invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::__invoke_result&&, (scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) std::invoke_result<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >::type std::invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::invoke_result&&, (scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/functional:110
 (inlined by) auto seastar::internal::future_invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2>&&) at ././seastar/include/seastar/core/future.hh:1223
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1594
 (inlined by) void seastar::futurize<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >::satisfy_with_result_of<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}::operator()(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >) const::{lambda()#1}>(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}::operator()(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >) const::{lambda()#1}) at ././seastar/include/seastar/core/future.hh:2132
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1587
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future>({lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}, seastar::optimized_optional<flat_mutation_reader_v2> >::run_and_dispose() at ././seastar/include/seastar/core/future.hh:781
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4331
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

Packages

Scylla version: 2023.1.5-20240213.08fd6aec7a43 with build-id 448979e99e198eeab4a3b0e1b929397d337d2724

Kernel Version: 5.15.0-1051-gcp

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/1433372650157216341 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test Test id: 682943fe-290b-40b7-b835-3003ed6c3c85 Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 682943fe-290b-40b7-b835-3003ed6c3c85` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=682943fe-290b-40b7-b835-3003ed6c3c85) - Show all stored logs command: `$ hydra investigate show-logs 682943fe-290b-40b7-b835-3003ed6c3c85` ## Logs: - **db-cluster-682943fe.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/db-cluster-682943fe.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/db-cluster-682943fe.tar.gz) - **sct-runner-events-682943fe.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-runner-events-682943fe.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-runner-events-682943fe.tar.gz) - **sct-682943fe.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-682943fe.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-682943fe.log.tar.gz) - **loader-set-682943fe.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/loader-set-682943fe.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/loader-set-682943fe.tar.gz) - **monitor-set-682943fe.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/monitor-set-682943fe.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/monitor-set-682943fe.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-large-partition-200k-pks-4days-gce-test/10/) [Argus](https://argus.scylladb.com/test/66ff9e4a-0655-4bba-89f4-e4eb2d78691d/runs?additionalRuns[]=682943fe-290b-40b7-b835-3003ed6c3c85)
juliayakovlev commented 8 months ago

Reproduced with 2023.1.6

Packages

Scylla version: 2023.1.6-20240306.ee8c8089d9c4 with build-id ba16490fa8be728988abec09fdc65e8f55710317

Kernel Version: 5.15.0-1053-gcp

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/1393419829280999616 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test Test id: 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2 Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2) - Show all stored logs command: `$ hydra investigate show-logs 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2` ## Logs: - **db-cluster-9e6b76ad.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/db-cluster-9e6b76ad.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/db-cluster-9e6b76ad.tar.gz) - **sct-runner-events-9e6b76ad.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-runner-events-9e6b76ad.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-runner-events-9e6b76ad.tar.gz) - **sct-9e6b76ad.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-9e6b76ad.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-9e6b76ad.log.tar.gz) - **loader-set-9e6b76ad.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/loader-set-9e6b76ad.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/loader-set-9e6b76ad.tar.gz) - **monitor-set-9e6b76ad.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/monitor-set-9e6b76ad.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/monitor-set-9e6b76ad.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-large-partition-200k-pks-4days-gce-test/11/) [Argus](https://argus.scylladb.com/test/66ff9e4a-0655-4bba-89f4-e4eb2d78691d/runs?additionalRuns[]=9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2)
soyacz commented 6 months ago

reproduced in 2023.1.8

Packages

Scylla version: 2023.1.8-20240502.c7683a2891c6 with build-id d7cbb560ad3a581b6eccbe170de0ca61fb618a19

Kernel Version: 5.15.0-1058-gcp

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/254008101309101179 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test Test id: 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=1e8e61cb-f19b-4ead-a0f8-0651cfa25bea) - Show all stored logs command: `$ hydra investigate show-logs 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea` ## Logs: - **db-cluster-1e8e61cb.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/db-cluster-1e8e61cb.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/db-cluster-1e8e61cb.tar.gz) - **sct-runner-events-1e8e61cb.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-runner-events-1e8e61cb.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-runner-events-1e8e61cb.tar.gz) - **sct-1e8e61cb.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-1e8e61cb.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-1e8e61cb.log.tar.gz) - **loader-set-1e8e61cb.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/loader-set-1e8e61cb.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/loader-set-1e8e61cb.tar.gz) - **monitor-set-1e8e61cb.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/monitor-set-1e8e61cb.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/monitor-set-1e8e61cb.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-large-partition-200k-pks-4days-gce-test/13/) [Argus](https://argus.scylladb.com/test/66ff9e4a-0655-4bba-89f4-e4eb2d78691d/runs?additionalRuns[]=1e8e61cb-f19b-4ead-a0f8-0651cfa25bea)
juliayakovlev commented 5 months ago

reproduced with 2023.1.9

Jun 11 14:31:35.318237 longevity-large-partitions-200k-pks-db-node-6666cbdc-0-2 scylla[9840]: Reactor stalled for 813 ms on shard 5. Backtrace: 0x54d82a3 0x54d7700 0x54d8a70 0x3cb5f 0x1e434f4 0x1e57d2a 0x1e57ea6 0x1e57ea6 0x1e57ea6 0x1e57ea6 0x1df90f2 0x1cb339e 0x1d411b5 0x1d38af1 0x1d3848b 0x1d37fc7 0x1d35c2b 0x1f273f2 0x1f282b8 0x1f34d8a 0x1f344c1 0x1f332f6 0x1f5928d 0x1f7292f 0x1f73921 0x54e8b84 0x54e9e07 0x550aef1 0x54bb04a 0x8b19c 0x10cc5f

void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
deletable_row at ././mutation_partition.hh:820
rows_entry at ././mutation_partition.hh:946
 (inlined by) rows_entry* allocation_strategy::construct<rows_entry, schema const&, rows_entry const&>(schema const&, rows_entry const&) at ././utils/allocation_strategy.hh:155
 (inlined by) operator() at ./mutation_partition.cc:152
 (inlined by) intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2084
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095

[Backtrace #1]
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
void intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone_from<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}>(intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0> const&, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&&) at ././utils/intrusive_btree.hh:638
 (inlined by) mutation_partition at ./mutation_partition.cc:154
partition_entry::squashed(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>) at ./partition_version.cc:501
 (inlined by) partition_entry::upgrade(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>, mutation_cleaner&, cache_tracker*) at ./partition_version.cc:517
operator() at ./row_cache.cc:1324
 (inlined by) decltype(auto) with_allocator<row_cache::upgrade_entry(cache_entry&)::$_28>(allocation_strategy&, row_cache::upgrade_entry(cache_entry&)::$_28&&) at ././utils/allocation_strategy.hh:313
 (inlined by) row_cache::upgrade_entry(cache_entry&) at ./row_cache.cc:1323
 (inlined by) scanning_and_populating_reader::read_from_entry(cache_entry&) at ./row_cache.cc:569
 (inlined by) operator() at ./row_cache.cc:597
decltype(auto) logalloc::allocating_section::with_reclaiming_disabled<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&) at ././utils/logalloc.hh:499
 (inlined by) operator() at ././utils/logalloc.hh:521
 (inlined by) decltype(auto) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}>(logalloc::region, logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:470
 (inlined by) decltype(auto) logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&) at ././utils/logalloc.hh:520
 (inlined by) scanning_and_populating_reader::do_read_from_primary() at ./row_cache.cc:580
 (inlined by) scanning_and_populating_reader::read_from_primary() at ./row_cache.cc:626
operator() at ./row_cache.cc:649
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<scanning_and_populating_reader::read_next_partition()::{lambda()#1}>(scanning_and_populating_reader::read_next_partition()::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<scanning_and_populating_reader::read_next_partition()::{lambda()#1}>(scanning_and_populating_reader::read_next_partition()::{lambda()#1}&&, seastar::internal::monostate) at ././seastar/include/seastar/core/future.hh:1991
 (inlined by) seastar::future<void> seastar::future<void>::then_impl<scanning_and_populating_reader::read_next_partition()::{lambda()#1}, seastar::future<void> >(scanning_and_populating_reader::read_next_partition()::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1613
seastar::internal::future_result<scanning_and_populating_reader::read_next_partition()::{lambda()#1}, void>::future_type seastar::internal::call_then_impl<seastar::future<void> >::run<scanning_and_populating_reader::read_next_partition()::{lambda()#1}>(seastar::future<void>&, seastar::internal::future_result&&) at ././seastar/include/seastar/core/future.hh:1246
 (inlined by) seastar::future<void> seastar::future<void>::then<scanning_and_populating_reader::read_next_partition()::{lambda()#1}, seastar::future<void> >(scanning_and_populating_reader::read_next_partition()::{lambda()#1}&&) at ././seastar/include/seastar/core/future.hh:1532
 (inlined by) scanning_and_populating_reader::read_next_partition() at ./row_cache.cc:647
 (inlined by) operator() at ./row_cache.cc:673
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<scanning_and_populating_reader::fill_buffer()::{lambda()#2}&>(scanning_and_populating_reader::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<scanning_and_populating_reader::fill_buffer()::{lambda()#2}&>(scanning_and_populating_reader::fill_buffer()::{lambda()#2}&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::future<void> seastar::do_until<scanning_and_populating_reader::fill_buffer()::{lambda()#2}, scanning_and_populating_reader::fill_buffer()::{lambda()#1}>(scanning_and_populating_reader::fill_buffer()::{lambda()#1}, scanning_and_populating_reader::fill_buffer()::{lambda()#2}) at ././seastar/include/seastar/core/loop.hh:343
scanning_and_populating_reader::fill_buffer() at ./row_cache.cc:671
flat_mutation_reader_v2::impl::operator()() at ././readers/flat_mutation_reader_v2.hh:194
 (inlined by) flat_mutation_reader_v2::operator()() at ././readers/flat_mutation_reader_v2.hh:414
 (inlined by) mutation_reader_merger::prepare_one(mutation_reader_merger::reader_and_last_fragment_kind, seastar::bool_class<mutation_reader_merger::reader_galloping_tag>) at ./readers/combined.cc:459
 (inlined by) operator() at ./readers/combined.cc:450
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<mutation_reader_merger::prepare_next()::$_2, mutation_reader_merger::reader_and_last_fragment_kind&>(mutation_reader_merger::prepare_next()::$_2&&, mutation_reader_merger::reader_and_last_fragment_kind&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<mutation_reader_merger::prepare_next()::$_2, mutation_reader_merger::reader_and_last_fragment_kind&>(mutation_reader_merger::prepare_next()::$_2&&, mutation_reader_merger::reader_and_last_fragment_kind&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::future<void> seastar::parallel_for_each<mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::prepare_next()::$_2>(mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::reader_and_last_fragment_kind*, mutation_reader_merger::prepare_next()::$_2&&) at ././seastar/include/seastar/core/loop.hh:569
 (inlined by) seastar::future<void> seastar::internal::parallel_for_each_impl<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2>(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&) at ././seastar/include/seastar/core/loop.hh:622
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2>(seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2>(seastar::future<void> (*&)(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&), utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) seastar::future<void> seastar::parallel_for_each<utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2>(utils::small_vector<mutation_reader_merger::reader_and_last_fragment_kind, 4ul>&, mutation_reader_merger::prepare_next()::$_2&&) at ././seastar/include/seastar/core/loop.hh:637
 (inlined by) mutation_reader_merger::prepare_next() at ./readers/combined.cc:449
mutation_reader_merger::maybe_produce_batch() at ./readers/combined.cc:586
operator() at ./readers/combined.cc:549
 (inlined by) seastar::future<std::optional<boost::iterator_range<mutation_fragment_and_stream_id*> > > seastar::futurize<seastar::future<std::optional<boost::iterator_range<mutation_fragment_and_stream_id*> > > >::invoke<mutation_reader_merger::operator()()::$_4&>(mutation_reader_merger::operator()()::$_4&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) seastar::repeat_until_value_type_helper<seastar::futurize<std::invoke_result<mutation_reader_merger::operator()()::$_4>::type>::type>::future_type seastar::repeat_until_value<mutation_reader_merger::operator()()::$_4>(mutation_reader_merger::operator()()::$_4) at ././seastar/include/seastar/core/loop.hh:238
 (inlined by) mutation_reader_merger::operator()() at ./readers/combined.cc:549
 (inlined by) operator() at ./readers/combined.cc:172
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&>(mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) seastar::future<void> seastar::repeat<mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}>(mutation_fragment_merger<mutation_reader_merger>::operator()()::{lambda()#1}&&) at ././seastar/include/seastar/core/loop.hh:120
mutation_fragment_merger<mutation_reader_merger>::operator()() at ./readers/combined.cc:171
 (inlined by) operator() at ./readers/combined.cc:700
 (inlined by) seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::invoke<merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&>(merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) seastar::future<void> seastar::repeat<merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}>(merging_reader<mutation_reader_merger>::fill_buffer()::{lambda()#1}&&) at ././seastar/include/seastar/core/loop.hh:120
merging_reader<mutation_reader_merger>::fill_buffer() at ./readers/combined.cc:699
flat_mutation_reader_v2::fill_buffer() at ./readers/flat_mutation_reader_v2.hh:509
 (inlined by) evictable_reader_v2::fill_buffer() at ./readers/multishard.cc:595
operator() at ./readers/multishard.cc:855
seastar::future<(anonymous namespace)::remote_fill_buffer_result_v2> std::__invoke_impl<seastar::future<(anonymous namespace)::remote_fill_buffer_result_v2>, (anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&>(std::__invoke_other, (anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::__invoke_result<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&>::type std::__invoke<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&>(std::__invoke_result&&, ((anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) std::invoke_result<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&>::type std::invoke<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&>(std::invoke_result&&, ((anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/functional:110
 (inlined by) decltype(auto) seastar::coroutine::lambda<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}>::operator()<>() const at ././seastar/include/seastar/core/coroutine.hh:259
 (inlined by) seastar::future<(anonymous namespace)::remote_fill_buffer_result_v2> seastar::futurize<seastar::future<(anonymous namespace)::remote_fill_buffer_result_v2> >::invoke<seastar::coroutine::lambda<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}>&>(seastar::coroutine::lambda<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}>&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) seastar::smp_message_queue::async_work_item<seastar::coroutine::lambda<(anonymous namespace)::shard_reader_v2::do_fill_buffer()::$_8::operator()() const::{lambda()#2}> >::run_and_dispose() at ././seastar/include/seastar/core/smp.hh:243
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4336
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290

[Backtrace #2]
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

Packages

Scylla version: 2023.1.9-20240609.4a93c32572b9 with build-id 4aea949e59e20db48d10bc09ed9af75aa98dfd9a

ersion: 5.15.0-1062-gcp

Issue description

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/303472689904370913 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test Test id: 6666cbdc-1694-45e7-b158-32b19e8cfb17 Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor 6666cbdc-1694-45e7-b158-32b19e8cfb17` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=6666cbdc-1694-45e7-b158-32b19e8cfb17) - Show all stored logs command: `$ hydra investigate show-logs 6666cbdc-1694-45e7-b158-32b19e8cfb17` ## Logs: - **db-cluster-6666cbdc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/db-cluster-6666cbdc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/db-cluster-6666cbdc.tar.gz) - **sct-runner-events-6666cbdc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/sct-runner-events-6666cbdc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/sct-runner-events-6666cbdc.tar.gz) - **sct-6666cbdc.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/sct-6666cbdc.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/sct-6666cbdc.log.tar.gz) - **loader-set-6666cbdc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/loader-set-6666cbdc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/loader-set-6666cbdc.tar.gz) - **monitor-set-6666cbdc.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/monitor-set-6666cbdc.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/6666cbdc-1694-45e7-b158-32b19e8cfb17/20240612_000914/monitor-set-6666cbdc.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-large-partition-200k-pks-4days-gce-test/15/) [Argus](https://argus.scylladb.com/test/66ff9e4a-0655-4bba-89f4-e4eb2d78691d/runs?additionalRuns[]=6666cbdc-1694-45e7-b158-32b19e8cfb17)
michoecho commented 5 months ago

@juliayakovlev It's going to keep reproducing in 2023.1 forever. Is there a point in adding new reports here? Is this for some internal QA bookkeeping?

If nothing else, it makes the thread harder to read in the future.

juliayakovlev commented 5 months ago

@juliayakovlev It's going to keep reproducing in 2023.1 forever. Is there a point in adding new reports here? Is this for some internal QA bookkeeping?

If nothing else, it makes the thread harder to read in the future.

@michoecho Actually we (QA) are reporting about received issues in the release. the issue is closed but we continue to get it. What does it mean: will it never be merged to 2023.1?

michoecho commented 5 months ago

What does it mean: will it never be merged to 2023.1?

Yes. This is a performance improvement (not a bugfix) with a very invasive implementation. It was merged in 2024.1 and it won't be backported.