Open kbr- opened 2 years ago
I do not understand your scenario. As far as I see
reply = co_await _snapshot_application_done[from].get_future();
can only throw during abort and only after io fiber and applier finer are stopped, so I do not see how any if what you are saying can happen.
You're right, I also don't see any other way it could throw at the moment. I got it by experimenting a bit with the order of abort
calls in order to fix a deadlock during abort in my test.
So it cannot happen right now, but it feels to me that only because we got lucky.
If that's the intended design, then it would be great to have a comment in server_impl::apply_snapshot
saying that this future can only throw when aborting, and that we assume io_fiber
and applier_fiber
to already be stopped by that time.
Yes, we can add such a comment.
@kbr- Please take time to add the comment.
I hit the following assertion failure in test_tablets in a branch off of 5afc242814:
=================================== FAILURES ===================================
_____ test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode _____
manager = <test.pylib.manager_client.ManagerClient object at 0x7fc198d8d690>
@pytest.mark.asyncio
async def test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode(manager: ManagerClient):
"""Test that you can create a table and insert and query data"""
logger.info("Bootstrapping cluster")
servers = [await manager.server_add(), await manager.server_add(), await manager.server_add()]
s0 = servers[0].server_id
not_s0 = servers[1:]
# s0 should miss schema and tablet changes
await manager.server_stop_gracefully(s0)
cql = manager.get_cql()
await cql.run_async("CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', "
"'replication_factor': 3, 'initial_tablets': 100};")
# force s0 to catch up later from the snapshot and not the raft log
await inject_error_one_shot_on(manager, 'raft_server_force_snapshot', not_s0)
await cql.run_async("CREATE TABLE test.test (pk int PRIMARY KEY, c int);")
keys = range(10)
await asyncio.gather(*[cql.run_async(f"INSERT INTO test.test (pk, c) VALUES ({k}, 1);") for k in keys])
rows = await cql.run_async("SELECT * FROM test.test;")
assert len(list(rows)) == len(keys)
for r in rows:
assert r.c == 1
manager.driver_close()
> await manager.server_start(s0, wait_others=2)
test/topology_experimental_raft/test_tablets.py:61:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:155: in server_start
await self.client.get_text(f"/cluster/server/{server_id}/start", params=params)
test/pylib/rest_client.py:82: in get_text
ret = await self._fetch("GET", resource_uri, response_type = "text", host = host,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <test.pylib.rest_client.UnixRESTClient object at 0x7fc198d8c650>
method = 'GET', resource = '/cluster/server/91/start', response_type = 'text'
host = None, port = None, params = None, json = None, timeout = None
async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
host: Optional[str] = None, port: Optional[int] = None,
params: Optional[Mapping[str, str]] = None,
json: Optional[Mapping] = None, timeout: Optional[float] = None) -> Any:
# Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
assert response_type is None or response_type in ["text", "json"], \
f"Invalid response type requested {response_type} (expected 'text' or 'json')"
# Build the URI
port = port if port else self.default_port if hasattr(self, "default_port") else None
port_str = f":{port}" if port else ""
assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
"{method} {resource}"
host_str = host if host is not None else self.default_host
uri = self.uri_scheme + "://" + host_str + port_str + resource
logging.debug(f"RESTClient fetching {method} {uri}")
client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
async with request(method, uri,
connector = self.connector if hasattr(self, "connector") else None,
params = params, json = json, timeout = client_timeout) as resp:
if resp.status != 200:
text = await resp.text()
> raise HTTPError(uri, resp.status, params, json, text)
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/server/91/start, params: None, json: None, body:
E failed to start the node, server_id 91, IP 127.209.26.36, workdir scylla-91, host_id a15c78d8-ad4e-40d1-ba6e-3e8d948a5496, cql [not connected]
E Check the log files:
E /home/bhalevy/dev/scylla/testlog/test.py.dev.log
E /home/bhalevy/dev/scylla/testlog/dev/scylla-91.log
scylla-91.log:
INFO 2023-08-29 11:05:11,065 [shard 0] raft_group0 - setup_group0: group 0 ID present. Starting existing Raft server.
INFO 2023-08-29 11:05:11,065 [shard 0] raft_group0 - Server a15c78d8-ad4e-40d1-ba6e-3e8d948a5496 is starting group 0 with id b4378f70-4642-11ee-b6ec-23c72ae977e4
INFO 2023-08-29 11:05:11,077 [shard 0] compaction - [Compact system_schema.columns c6b4fd40-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-3g8x_0mgn_0cuyp222k9djc9mixi-big-Data.db:level=0]. 23kB to 18kB (~79% of original) in 15ms = 1MB/s. ~256 total partitions merged to 6.
INFO 2023-08-29 11:05:11,077 [shard 0] compaction - [Compact system.peers c6b79550-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg7_4vjg02s54wnom7f278-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg5_39vtt2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,082 [shard 0] compaction - [Compact system.peers c6b79550-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0gi4x222k9djc9mixi-big-Data.db:level=0]. 13kB to 7kB (~55% of original) in 3ms = 4MB/s. ~256 total partitions merged to 1.
INFO 2023-08-29 11:05:11,082 [shard 0] compaction - [Compact system.scylla_local c6b858a0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mgn_0gaf4222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mg5_36gdd2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,083 [shard 1] compaction - [Compact system.peers c6b87fb0-4642-11ee-a813-bdde3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0h5a82k0k1n43po4ly-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg5_390yp24g1qum6urxes-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,097 [shard 1] compaction - [Compact system.peers c6b87fb0-4642-11ee-a813-bdde3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0hsfl2k0k1n43po4ly-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 13ms = 1MB/s. ~256 total partitions merged to 1.
INFO 2023-08-29 11:05:11,098 [shard 0] compaction - [Compact system.scylla_local c6b858a0-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mgn_0hkpt222k9djc9mixi-big-Data.db:level=0]. 98kB to 92kB (~93% of original) in 14ms = 7MB/s. ~256 total partitions merged to 3.
INFO 2023-08-29 11:05:11,098 [shard 0] compaction - [Compact system.truncated c6bac9a0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgm_5y4c1222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgm_5y4c0222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/mc-6-big-Data.db:level=0:origin=compaction]
WARN 2023-08-29 11:05:11,098 [shard 0] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.209.26.36, application_states = {{STATUS, Value(NORMAL,975945372173672557,1)}})
INFO 2023-08-29 11:05:11,101 [shard 1] compaction - [Compact system.peers c6bb3ed0-4642-11ee-a813-bdde3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0l0682k0k1n43po4ly-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0hsfl2k0k1n43po4ly-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,102 [shard 0] cdc - Started using generation (2023/08/29 08:04:45, 36e44f8f-951e-40b7-8d34-39a095f3e133).
INFO 2023-08-29 11:05:11,102 [shard 0] raft_group_registry - marking Raft server e6e8f1ca-1a11-4733-8d70-e89246b1a011 as alive for raft groups
INFO 2023-08-29 11:05:11,102 [shard 0] raft_group_registry - marking Raft server 6b6327d1-0329-41f2-9fe9-c3d86ed38dc9 as alive for raft groups
INFO 2023-08-29 11:05:11,103 [shard 0] raft_group0 - Disabling migration_manager schema pulls because Raft is fully functioning in this cluster.
INFO 2023-08-29 11:05:11,103 [shard 0] compaction - [Compact system.truncated c6bac9a0-4642-11ee-874c-bddd3766d256] Compacted 3 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgn_0l06a222k9djc9mixi-big-Data.db:level=0]. 16kB to 5kB (~35% of original) in 3ms = 5MB/s. ~384 total partitions merged to 1.
INFO 2023-08-29 11:05:11,103 [shard 0] compaction - [Compact system.peers c6bb8cf0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0l069222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0gi4x222k9djc9mixi-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,103 [shard 0] messaging_service - Starting Messaging Service on port 7000
INFO 2023-08-29 11:05:11,103 [shard 0] storage_service - entering STARTING mode
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - initial_contact_nodes={127.209.26.36}, loaded_endpoints={}, loaded_peer_features=2
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - peer=127.209.26.23, supported_features=AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLETS,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - peer=127.209.26.5, supported_features=AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLETS,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - Checking remote features with gossip, initial_contact_nodes={127.209.26.36}
INFO 2023-08-29 11:05:11,104 [shard 0] gossip - Gossip shadow round started with nodes={}
WARN 2023-08-29 11:05:11,104 [shard 0] gossip - All nodes={} are down for get_endpoint_states verb. Skip ShadowRound.
INFO 2023-08-29 11:05:11,104 [shard 0] gossip - Gossip shadow round finished with nodes_talked={}
INFO 2023-08-29 11:05:11,104 [shard 0] gossip - Node 127.209.26.23 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO 2023-08-29 11:05:11,104 [shard 0] gossip - Node 127.209.26.5 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO 2023-08-29 11:05:11,104 [shard 0] gossip - Feature check passed. Local node 127.209.26.36 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - Save advertised features list in the 'system.local' table
INFO 2023-08-29 11:05:11,104 [shard 0] storage_service - Restarting a node in NORMAL status
WARN 2023-08-29 11:05:11,104 [shard 0] cdc - Restarting node in NORMAL status with CDC enabled, but no streams timestamp was proposed by this node according to its local tables. Are we upgrading from a non-CDC supported version?
INFO 2023-08-29 11:05:11,105 [shard 1] compaction - [Compact system.peers c6bb3ed0-4642-11ee-a813-bdde3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0lnbl2k0k1n43po4ly-big-Data.db:level=0]. 13kB to 7kB (~55% of original) in 2ms = 6MB/s. ~256 total partitions merged to 1.
INFO 2023-08-29 11:05:11,109 [shard 0] schema_tables - Schema version changed to 4dd351cd-8420-37e3-87f8-4a5924e6c341
INFO 2023-08-29 11:05:11,109 [shard 0] storage_service - Starting up server gossip
INFO 2023-08-29 11:05:11,418 [shard 0] compaction - [Compact system.peers c6bb8cf0-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0m2r5222k9djc9mixi-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 28ms = 476kB/s. ~256 total partitions merged to 1.
INFO 2023-08-29 11:05:11,420 [shard 0] schema_tables - Creating keyspace test
INFO 2023-08-29 11:05:11,420 [shard 0] compaction - [Compact system.local c6ebebc0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mgn_0hsfk222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mgn_0nd1s222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mg5_390yp2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO 2023-08-29 11:05:11,420 [shard 0] gossip - failure_detector_loop: Started main loop
INFO 2023-08-29 11:05:11,420 [shard 0] storage_service - entering JOINING mode
INFO 2023-08-29 11:05:11,420 [shard 0] storage_service - topology changes are using raft
INFO 2023-08-29 11:05:11,420 [shard 0] init - starting system distributed keyspace
INFO 2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - system_distributed keyspace is already present. Not creating
INFO 2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - system_distributed_everywhere keyspace is already present. Not creating
INFO 2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - All tables are present on start
INFO 2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - All schemas are uptodate on start
INFO 2023-08-29 11:05:11,524 [shard 0] schema_tables - Creating test.test id=c6448830-4642-11ee-8bc6-4ef5d54b5893 version=40cac434-3207-3def-b67b-319a07c611e3
INFO 2023-08-29 11:05:11,524 [shard 0] schema_tables - Tablet metadata changed
INFO 2023-08-29 11:05:11,660 [shard 0] gossip - InetAddress 127.209.26.23 is now UP, status = NORMAL
INFO 2023-08-29 11:05:11,660 [shard 0] gossip - InetAddress 127.209.26.5 is now UP, status = NORMAL
INFO 2023-08-29 11:05:12,343 [shard 0] schema_tables - Schema version changed to eb504352-b88e-37ba-bd21-2f9c2abe11aa
INFO 2023-08-29 11:05:12,463 [shard 0] schema_tables - Tablet metadata changed
INFO 2023-08-29 11:05:12,469 [shard 0] schema_tables - Schema version changed to eb504352-b88e-37ba-bd21-2f9c2abe11aa
ERROR 2023-08-29 11:05:13,273 [shard 0] raft - apply_snapshot[a15c78d8-ad4e-40d1-ba6e-3e8d948a5496]: ignore outdated snapshot 19995803-df74-435f-b36e-346c8a527033/45 current one is 19995803-df74-435f-b36e-346c8a527033/45, commit_idx=45
scylla: raft/server.cc:1154: virtual future<snapshot_reply> raft::server_impl::apply_snapshot(server_id, install_snapshot): Assertion `! _snapshot_application_done.contains(from)' failed.
Aborting on shard 0.
Backtrace:
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3b57fc
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3e0852
/lib64/libc.so.6+0x3db6f
/lib64/libc.so.6+0x8e843
/lib64/libc.so.6+0x3dabd
/lib64/libc.so.6+0x2687e
/lib64/libc.so.6+0x2679a
/lib64/libc.so.6+0x36146
0x2773f04
0x26ca3b5
0x26cd93a
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c402f
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c5117
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c45e9
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x33f108
/home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x33e342
0xfbe229
0xff5020
0xfbd31c
/lib64/libc.so.6+0x27b49
/lib64/libc.so.6+0x27c0a
0xfbbe24
------ Ending test dev/topology_experimental_raft.test_tablets.1::test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode ------
Decoded:
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/dev/seastar/./seastar/include/seastar/util/backtrace.hh:64
(inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/dev/seastar/./seastar/src/core/reactor.cc:818
(inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/dev/seastar/./seastar/src/core/reactor.cc:848
seastar::print_with_backtrace(char const*, bool) at ./build/dev/seastar/./seastar/src/core/reactor.cc:860
(inlined by) seastar::sigabrt_action() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3976
(inlined by) operator() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3952
(inlined by) __invoke at ./build/dev/seastar/./seastar/src/core/reactor.cc:3948
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=245240a31888ad5c11bbc55b18e02d87388f59a9, for GNU/Linux 3.2.0, not stripped
__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
__assert_fail_base.cold at ??:?
__assert_fail at ??:?
raft::server_impl::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot) at ./raft/server.cc:1154
operator() at ./service/raft/raft_rpc.cc:188
(inlined by) seastar::future<raft::snapshot_reply> seastar::futurize<seastar::future<raft::snapshot_reply> >::invoke<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/future.hh:2006
(inlined by) auto seastar::futurize_invoke<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/future.hh:2037
(inlined by) auto seastar::internal::invoke_func_with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate::holder&&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/gate.hh:309
(inlined by) auto seastar::with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/gate.hh:327
(inlined by) decltype ({parm#2}()) service::raft_with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ./service/raft/raft_rpc.cc:168
(inlined by) service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot) at ./service/raft/raft_rpc.cc:187
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<raft::snapshot_reply>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
(inlined by) seastar::internal::coroutine_traits_base<raft::snapshot_reply>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:83
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/dev/seastar/./seastar/src/core/reactor.cc:2597
(inlined by) seastar::reactor::run_some_tasks() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:2006
__libc_start_call_main at ??:?
__libc_start_main_alias_2 at :?
_start at ??:?
@bhalevy could you please open a new issue? I'm not sure this is related to the OP.
@bhalevy can I get all of the logs?
@gleb-cloudius testlog.tar.gz
Ok the log confirm my theory of what may have happened here. Node A send a snapshot to node B and while node B process the snapshot the connection is dropped and A gets an error on its snapshot transfer. After a while it retries, but A still processes previous snapshot and the invariant that assert checks is violated. @kbr-scylla
@bhalevy could you please open a new issue? I'm not sure this is related to the OP.
@bhalevy did you do it?
I'll do it
Consider:
This function may be called by RPC module on a follower when it receives a remote snapshot from a leader.
_fsm->step(...)
causes the next batch obtained byio_fiber
to hold a snapshot descriptor.io_fiber
then tries to store it by calling_persistence->store_snapshot_descriptor()
, and then it will send it toapplier_fiber
, which will call_state_machine->load_snapshot(...)
.However, consider what happens if this throws:
in this case we drop the snapshot. Then it's not possible to store or apply it. But
_persistence->store_snapshot_descriptor(...)
and_state_machine->load_snapshot(...)
may still be called for this snapshot. What to do in this case?In my implementation
store_snapshot_descriptor
failed on an assertion because it expected the snapshot to be present, but I removed it in_state_machine->drop_snapshot(...)
It's worse than that.
server_impl::apply_snapshot
may race withio_fiber
andapplier_fiber
at theco_await
. E.g. it could happen thatstore_snapshot_descriptor
gets called before the snapshot gets dropped, then it gets dropped byapply_snapshot
, and then we try to apply it to the state machine inapplier_fiber
.cc @kostja @gleb-cloudius