scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.66k stars 1.3k forks source link

raft: server may try to store and apply snapshot which was dropped #9956

Open kbr- opened 2 years ago

kbr- commented 2 years ago

Consider:

future<snapshot_reply> server_impl::apply_snapshot(server_id from, install_snapshot snp) {
    _fsm->step(from, std::move(snp));
    // Only one snapshot can be received at a time from each node
    assert(! _snapshot_application_done.contains(from));
    snapshot_reply reply{_fsm->get_current_term(), false};
    try {
        reply = co_await _snapshot_application_done[from].get_future();
    } catch (...) {
        logger.error("apply_snapshot[{}] failed with {}", _id, std::current_exception());
    }
    if (!reply.success) {
        // Drop snapshot that failed to be applied
        _state_machine->drop_snapshot(snp.snp.id);
    }

    co_return reply;
}

This function may be called by RPC module on a follower when it receives a remote snapshot from a leader.

_fsm->step(...) causes the next batch obtained by io_fiber to hold a snapshot descriptor. io_fiber then tries to store it by calling _persistence->store_snapshot_descriptor(), and then it will send it to applier_fiber, which will call _state_machine->load_snapshot(...).

However, consider what happens if this throws:

        reply = co_await _snapshot_application_done[from].get_future();

in this case we drop the snapshot. Then it's not possible to store or apply it. But _persistence->store_snapshot_descriptor(...) and _state_machine->load_snapshot(...) may still be called for this snapshot. What to do in this case?

In my implementation store_snapshot_descriptor failed on an assertion because it expected the snapshot to be present, but I removed it in _state_machine->drop_snapshot(...)

It's worse than that. server_impl::apply_snapshot may race with io_fiber and applier_fiber at the co_await. E.g. it could happen that store_snapshot_descriptor gets called before the snapshot gets dropped, then it gets dropped by apply_snapshot, and then we try to apply it to the state machine in applier_fiber.

cc @kostja @gleb-cloudius

gleb-cloudius commented 2 years ago

I do not understand your scenario. As far as I see

reply = co_await _snapshot_application_done[from].get_future();

can only throw during abort and only after io fiber and applier finer are stopped, so I do not see how any if what you are saying can happen.

kbr- commented 2 years ago

You're right, I also don't see any other way it could throw at the moment. I got it by experimenting a bit with the order of abort calls in order to fix a deadlock during abort in my test.

So it cannot happen right now, but it feels to me that only because we got lucky.

If that's the intended design, then it would be great to have a comment in server_impl::apply_snapshot saying that this future can only throw when aborting, and that we assume io_fiber and applier_fiber to already be stopped by that time.

gleb-cloudius commented 2 years ago

Yes, we can add such a comment.

kostja commented 1 year ago

@kbr- Please take time to add the comment.

bhalevy commented 1 year ago

I hit the following assertion failure in test_tablets in a branch off of 5afc242814:

=================================== FAILURES ===================================
_____ test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode _____

manager = <test.pylib.manager_client.ManagerClient object at 0x7fc198d8d690>

    @pytest.mark.asyncio
    async def test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode(manager: ManagerClient):
        """Test that you can create a table and insert and query data"""

        logger.info("Bootstrapping cluster")
        servers = [await manager.server_add(), await manager.server_add(), await manager.server_add()]

        s0 = servers[0].server_id
        not_s0 = servers[1:]

        # s0 should miss schema and tablet changes
        await manager.server_stop_gracefully(s0)

        cql = manager.get_cql()
        await cql.run_async("CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', "
                            "'replication_factor': 3, 'initial_tablets': 100};")

        # force s0 to catch up later from the snapshot and not the raft log
        await inject_error_one_shot_on(manager, 'raft_server_force_snapshot', not_s0)
        await cql.run_async("CREATE TABLE test.test (pk int PRIMARY KEY, c int);")

        keys = range(10)
        await asyncio.gather(*[cql.run_async(f"INSERT INTO test.test (pk, c) VALUES ({k}, 1);") for k in keys])

        rows = await cql.run_async("SELECT * FROM test.test;")
        assert len(list(rows)) == len(keys)
        for r in rows:
            assert r.c == 1

        manager.driver_close()
>       await manager.server_start(s0, wait_others=2)

test/topology_experimental_raft/test_tablets.py:61: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/pylib/manager_client.py:155: in server_start
    await self.client.get_text(f"/cluster/server/{server_id}/start", params=params)
test/pylib/rest_client.py:82: in get_text
    ret = await self._fetch("GET", resource_uri, response_type = "text", host = host,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test.pylib.rest_client.UnixRESTClient object at 0x7fc198d8c650>
method = 'GET', resource = '/cluster/server/91/start', response_type = 'text'
host = None, port = None, params = None, json = None, timeout = None

    async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
                     host: Optional[str] = None, port: Optional[int] = None,
                     params: Optional[Mapping[str, str]] = None,
                     json: Optional[Mapping] = None, timeout: Optional[float] = None) -> Any:
        # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
        assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
        assert response_type is None or response_type in ["text", "json"], \
                f"Invalid response type requested {response_type} (expected 'text' or 'json')"
        # Build the URI
        port = port if port else self.default_port if hasattr(self, "default_port") else None
        port_str = f":{port}" if port else ""
        assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
                "{method} {resource}"
        host_str = host if host is not None else self.default_host
        uri = self.uri_scheme + "://" + host_str + port_str + resource
        logging.debug(f"RESTClient fetching {method} {uri}")

        client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
        async with request(method, uri,
                           connector = self.connector if hasattr(self, "connector") else None,
                           params = params, json = json, timeout = client_timeout) as resp:
            if resp.status != 200:
                text = await resp.text()
>               raise HTTPError(uri, resp.status, params, json, text)
E               test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/server/91/start, params: None, json: None, body:
E               failed to start the node, server_id 91, IP 127.209.26.36, workdir scylla-91, host_id a15c78d8-ad4e-40d1-ba6e-3e8d948a5496, cql [not connected]
E               Check the log files:
E               /home/bhalevy/dev/scylla/testlog/test.py.dev.log
E               /home/bhalevy/dev/scylla/testlog/dev/scylla-91.log

scylla-91.log:

INFO  2023-08-29 11:05:11,065 [shard 0] raft_group0 - setup_group0: group 0 ID present. Starting existing Raft server.
INFO  2023-08-29 11:05:11,065 [shard 0] raft_group0 - Server a15c78d8-ad4e-40d1-ba6e-3e8d948a5496 is starting group 0 with id b4378f70-4642-11ee-b6ec-23c72ae977e4
INFO  2023-08-29 11:05:11,077 [shard 0] compaction - [Compact system_schema.columns c6b4fd40-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/me-3g8x_0mgn_0cuyp222k9djc9mixi-big-Data.db:level=0]. 23kB to 18kB (~79% of original) in 15ms = 1MB/s. ~256 total partitions merged to 6.
INFO  2023-08-29 11:05:11,077 [shard 0] compaction - [Compact system.peers c6b79550-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg7_4vjg02s54wnom7f278-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg5_39vtt2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,082 [shard 0] compaction - [Compact system.peers c6b79550-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0gi4x222k9djc9mixi-big-Data.db:level=0]. 13kB to 7kB (~55% of original) in 3ms = 4MB/s. ~256 total partitions merged to 1.
INFO  2023-08-29 11:05:11,082 [shard 0] compaction - [Compact system.scylla_local c6b858a0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mgn_0gaf4222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mg5_36gdd2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,083 [shard 1] compaction - [Compact system.peers c6b87fb0-4642-11ee-a813-bdde3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0h5a82k0k1n43po4ly-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mg5_390yp24g1qum6urxes-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,097 [shard 1] compaction - [Compact system.peers c6b87fb0-4642-11ee-a813-bdde3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0hsfl2k0k1n43po4ly-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 13ms = 1MB/s. ~256 total partitions merged to 1.
INFO  2023-08-29 11:05:11,098 [shard 0] compaction - [Compact system.scylla_local c6b858a0-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/scylla_local-2972ec7ffb2038ddaac1d876f2e3fcbd/me-3g8x_0mgn_0hkpt222k9djc9mixi-big-Data.db:level=0]. 98kB to 92kB (~93% of original) in 14ms = 7MB/s. ~256 total partitions merged to 3.
INFO  2023-08-29 11:05:11,098 [shard 0] compaction - [Compact system.truncated c6bac9a0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgm_5y4c1222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgm_5y4c0222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/mc-6-big-Data.db:level=0:origin=compaction]
WARN  2023-08-29 11:05:11,098 [shard 0] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.209.26.36, application_states = {{STATUS, Value(NORMAL,975945372173672557,1)}})
INFO  2023-08-29 11:05:11,101 [shard 1] compaction - [Compact system.peers c6bb3ed0-4642-11ee-a813-bdde3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0l0682k0k1n43po4ly-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0hsfl2k0k1n43po4ly-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,102 [shard 0] cdc - Started using generation (2023/08/29 08:04:45, 36e44f8f-951e-40b7-8d34-39a095f3e133).
INFO  2023-08-29 11:05:11,102 [shard 0] raft_group_registry - marking Raft server e6e8f1ca-1a11-4733-8d70-e89246b1a011 as alive for raft groups
INFO  2023-08-29 11:05:11,102 [shard 0] raft_group_registry - marking Raft server 6b6327d1-0329-41f2-9fe9-c3d86ed38dc9 as alive for raft groups
INFO  2023-08-29 11:05:11,103 [shard 0] raft_group0 - Disabling migration_manager schema pulls because Raft is fully functioning in this cluster.
INFO  2023-08-29 11:05:11,103 [shard 0] compaction - [Compact system.truncated c6bac9a0-4642-11ee-874c-bddd3766d256] Compacted 3 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3g8x_0mgn_0l06a222k9djc9mixi-big-Data.db:level=0]. 16kB to 5kB (~35% of original) in 3ms = 5MB/s. ~384 total partitions merged to 1.
INFO  2023-08-29 11:05:11,103 [shard 0] compaction - [Compact system.peers c6bb8cf0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0l069222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0gi4x222k9djc9mixi-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,103 [shard 0] messaging_service - Starting Messaging Service on port 7000
INFO  2023-08-29 11:05:11,103 [shard 0] storage_service - entering STARTING mode
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - initial_contact_nodes={127.209.26.36}, loaded_endpoints={}, loaded_peer_features=2
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - peer=127.209.26.23, supported_features=AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLETS,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - peer=127.209.26.5, supported_features=AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLETS,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - Checking remote features with gossip, initial_contact_nodes={127.209.26.36}
INFO  2023-08-29 11:05:11,104 [shard 0] gossip - Gossip shadow round started with nodes={}
WARN  2023-08-29 11:05:11,104 [shard 0] gossip - All nodes={} are down for get_endpoint_states verb. Skip ShadowRound.
INFO  2023-08-29 11:05:11,104 [shard 0] gossip - Gossip shadow round finished with nodes_talked={}
INFO  2023-08-29 11:05:11,104 [shard 0] gossip - Node 127.209.26.23 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2023-08-29 11:05:11,104 [shard 0] gossip - Node 127.209.26.5 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2023-08-29 11:05:11,104 [shard 0] gossip - Feature check passed. Local node 127.209.26.36 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLETS, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - Save advertised features list in the 'system.local' table
INFO  2023-08-29 11:05:11,104 [shard 0] storage_service - Restarting a node in NORMAL status
WARN  2023-08-29 11:05:11,104 [shard 0] cdc - Restarting node in NORMAL status with CDC enabled, but no streams timestamp was proposed by this node according to its local tables. Are we upgrading from a non-CDC supported version?
INFO  2023-08-29 11:05:11,105 [shard 1] compaction - [Compact system.peers c6bb3ed0-4642-11ee-a813-bdde3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0lnbl2k0k1n43po4ly-big-Data.db:level=0]. 13kB to 7kB (~55% of original) in 2ms = 6MB/s. ~256 total partitions merged to 1.
INFO  2023-08-29 11:05:11,109 [shard 0] schema_tables - Schema version changed to 4dd351cd-8420-37e3-87f8-4a5924e6c341
INFO  2023-08-29 11:05:11,109 [shard 0] storage_service - Starting up server gossip
INFO  2023-08-29 11:05:11,418 [shard 0] compaction - [Compact system.peers c6bb8cf0-4642-11ee-874c-bddd3766d256] Compacted 2 sstables to [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/peers-37f71aca7dc2383ba70672528af04d4f/me-3g8x_0mgn_0m2r5222k9djc9mixi-big-Data.db:level=0]. 13kB to 7kB (~54% of original) in 28ms = 476kB/s. ~256 total partitions merged to 1.
INFO  2023-08-29 11:05:11,420 [shard 0] schema_tables - Creating keyspace test
INFO  2023-08-29 11:05:11,420 [shard 0] compaction - [Compact system.local c6ebebc0-4642-11ee-874c-bddd3766d256] Compacting [/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mgn_0hsfk222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mgn_0nd1s222k9djc9mixi-big-Data.db:level=0:origin=memtable,/home/bhalevy/dev/scylla/testlog/dev/scylla-91/data/system/local-7ad54392bcdd35a684174e047860b377/me-3g8x_0mg5_390yp2s54wnom7f278-big-Data.db:level=0:origin=compaction]
INFO  2023-08-29 11:05:11,420 [shard 0] gossip - failure_detector_loop: Started main loop
INFO  2023-08-29 11:05:11,420 [shard 0] storage_service - entering JOINING mode
INFO  2023-08-29 11:05:11,420 [shard 0] storage_service - topology changes are using raft
INFO  2023-08-29 11:05:11,420 [shard 0] init - starting system distributed keyspace
INFO  2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - system_distributed keyspace is already present. Not creating
INFO  2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - system_distributed_everywhere keyspace is already present. Not creating
INFO  2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - All tables are present on start
INFO  2023-08-29 11:05:11,420 [shard 0] system_distributed_keyspace - All schemas are uptodate on start
INFO  2023-08-29 11:05:11,524 [shard 0] schema_tables - Creating test.test id=c6448830-4642-11ee-8bc6-4ef5d54b5893 version=40cac434-3207-3def-b67b-319a07c611e3
INFO  2023-08-29 11:05:11,524 [shard 0] schema_tables - Tablet metadata changed
INFO  2023-08-29 11:05:11,660 [shard 0] gossip - InetAddress 127.209.26.23 is now UP, status = NORMAL
INFO  2023-08-29 11:05:11,660 [shard 0] gossip - InetAddress 127.209.26.5 is now UP, status = NORMAL
INFO  2023-08-29 11:05:12,343 [shard 0] schema_tables - Schema version changed to eb504352-b88e-37ba-bd21-2f9c2abe11aa
INFO  2023-08-29 11:05:12,463 [shard 0] schema_tables - Tablet metadata changed
INFO  2023-08-29 11:05:12,469 [shard 0] schema_tables - Schema version changed to eb504352-b88e-37ba-bd21-2f9c2abe11aa
ERROR 2023-08-29 11:05:13,273 [shard 0] raft - apply_snapshot[a15c78d8-ad4e-40d1-ba6e-3e8d948a5496]: ignore outdated snapshot 19995803-df74-435f-b36e-346c8a527033/45 current one is 19995803-df74-435f-b36e-346c8a527033/45, commit_idx=45
scylla: raft/server.cc:1154: virtual future<snapshot_reply> raft::server_impl::apply_snapshot(server_id, install_snapshot): Assertion `! _snapshot_application_done.contains(from)' failed.
Aborting on shard 0.
Backtrace:
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3b57fc
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3e0852
  /lib64/libc.so.6+0x3db6f
  /lib64/libc.so.6+0x8e843
  /lib64/libc.so.6+0x3dabd
  /lib64/libc.so.6+0x2687e
  /lib64/libc.so.6+0x2679a
  /lib64/libc.so.6+0x36146
  0x2773f04
  0x26ca3b5
  0x26cd93a
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c402f
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c5117
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x3c45e9
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x33f108
  /home/bhalevy/dev/scylla/build/dev/seastar/libseastar.so+0x33e342
  0xfbe229
  0xff5020
  0xfbd31c
  /lib64/libc.so.6+0x27b49
  /lib64/libc.so.6+0x27c0a
  0xfbbe24
------ Ending test dev/topology_experimental_raft.test_tablets.1::test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode ------

Decoded:

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/dev/seastar/./seastar/include/seastar/util/backtrace.hh:64
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/dev/seastar/./seastar/src/core/reactor.cc:818
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/dev/seastar/./seastar/src/core/reactor.cc:848
seastar::print_with_backtrace(char const*, bool) at ./build/dev/seastar/./seastar/src/core/reactor.cc:860
 (inlined by) seastar::sigabrt_action() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3976
 (inlined by) operator() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3952
 (inlined by) __invoke at ./build/dev/seastar/./seastar/src/core/reactor.cc:3948
/lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=245240a31888ad5c11bbc55b18e02d87388f59a9, for GNU/Linux 3.2.0, not stripped

__GI___sigaction at :?
__pthread_kill_implementation at ??:?
__GI_raise at :?
__GI_abort at :?
__assert_fail_base.cold at ??:?
__assert_fail at ??:?
raft::server_impl::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot) at ./raft/server.cc:1154
operator() at ./service/raft/raft_rpc.cc:188
 (inlined by) seastar::future<raft::snapshot_reply> seastar::futurize<seastar::future<raft::snapshot_reply> >::invoke<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/future.hh:2006
 (inlined by) auto seastar::futurize_invoke<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/future.hh:2037
 (inlined by) auto seastar::internal::invoke_func_with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate::holder&&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/gate.hh:309
 (inlined by) auto seastar::with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ././seastar/include/seastar/core/gate.hh:327
 (inlined by) decltype ({parm#2}()) service::raft_with_gate<service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0>(seastar::gate&, service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot)::$_0&&) at ./service/raft/raft_rpc.cc:168
 (inlined by) service::raft_rpc::apply_snapshot(utils::tagged_uuid<raft::server_id_tag>, raft::install_snapshot) at ./service/raft/raft_rpc.cc:187
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<raft::snapshot_reply>::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240
 (inlined by) seastar::internal::coroutine_traits_base<raft::snapshot_reply>::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:83
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/dev/seastar/./seastar/src/core/reactor.cc:2597
 (inlined by) seastar::reactor::run_some_tasks() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3060
seastar::reactor::do_run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3229
seastar::reactor::run() at ./build/dev/seastar/./seastar/src/core/reactor.cc:3112
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:276
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/dev/seastar/./seastar/src/core/app-template.cc:167
scylla_main(int, char**) at ./main.cc:631
std::function<int (int, char**)>::operator()(int, char**) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
main at ./main.cc:2006
__libc_start_call_main at ??:?
__libc_start_main_alias_2 at :?
_start at ??:?
kbr-scylla commented 1 year ago

@bhalevy could you please open a new issue? I'm not sure this is related to the OP.

gleb-cloudius commented 1 year ago

@bhalevy can I get all of the logs?

bhalevy commented 1 year ago

@gleb-cloudius testlog.tar.gz

gleb-cloudius commented 1 year ago

Ok the log confirm my theory of what may have happened here. Node A send a snapshot to node B and while node B process the snapshot the connection is dropped and A gets an error on its snapshot transfer. After a while it retries, but A still processes previous snapshot and the invariant that assert checks is violated. @kbr-scylla

gleb-cloudius commented 1 year ago

@bhalevy could you please open a new issue? I'm not sure this is related to the OP.

@bhalevy did you do it?

kbr-scylla commented 1 year ago

I'll do it

kbr-scylla commented 1 year ago

https://github.com/scylladb/scylladb/issues/15222