scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.49k stars 1.28k forks source link

Decommission after node change ip address, blocks adding new nodes #11355

Closed fruch closed 2 years ago

fruch commented 2 years ago

While building a reproducer for scylladb/scylla-operator#982 and #11302 we run into this case:

adding node4, fails with the following:

ERROR 2022-08-23 09:08:31,079 [shard 0] init - Startup failed: exceptions::unavailable_exception (Cannot achieve consistency level for cl ALL. Requires 3, alive 2)

all 3 nodes show 4 nodes in the gossip, while 2 are having the exact same host_id, one in shutdown state, one in LEFT state:

/127.0.28.2
  generation:1661235135
  heartbeat:294
  X6:1
  RACK:rack1
  LOAD:589824
  NET_VERSION:0
  HOST_ID:de153abc-d0ec-4e41-8233-a53ceefa0bf6
  X1:CDC,CDC_GENERATIONS_V2,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_TABLES_V3,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UNBOUNDED_RANGE_TOMBSTONES,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
  RELEASE_VERSION:3.0.8
  X3:3
  X4:1
  X2:system_traces.sessions:0.000000;system_traces.node_slow_log:0.000000;system_distributed.service_levels:1.000000;system_distributed.view_build_status:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.569246;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_distributed.cdc_generation_timestamps:0.397865;system_auth.role_attributes:0.000000;system_traces.events:0.000000;
  X7:12
  DC:datacenter1
  STATUS:NORMAL,2036647766093796734
  X8:v2;1661235250046;dd4b30e0-6f52-4442-baa9-0b1733a05c27
  RPC_ADDRESS:127.0.28.2
  X9:org.apache.cassandra.locator.SimpleSnitch
  SCHEMA:e80a7c3f-a0dc-3998-ac70-7ab147cbb7d0
  X5:0:53687091:1661235182411
/127.0.28.3
  generation:1661235194
  heartbeat:2147483647
  X6:1
  RACK:rack1
  LOAD:716800
  NET_VERSION:0
  HOST_ID:cb820aa9-671f-4eed-8a6a-7385c8ffe292
  X1:CDC,CDC_GENERATIONS_V2,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_TABLES_V3,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UNBOUNDED_RANGE_TOMBSTONES,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
  RELEASE_VERSION:3.0.8
  X3:3
  X4:1
  X2:system_traces.sessions:0.000000;system_traces.node_slow_log:0.000000;system_distributed.service_levels:0.000000;system_distributed.view_build_status:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_attributes:0.000000;system_traces.events:0.000000;
  X7:12
  DC:datacenter1
  STATUS:shutdown,true
  X8:v2;1661235250046;dd4b30e0-6f52-4442-baa9-0b1733a05c27
  RPC_ADDRESS:127.0.28.3
  X9:org.apache.cassandra.locator.SimpleSnitch
  SCHEMA:e80a7c3f-a0dc-3998-ac70-7ab147cbb7d0
  X5:0:53687091:1661235242156
/127.0.28.1
  generation:1661235110
  heartbeat:399
  X6:1
  RACK:rack1
  LOAD:516096
  NET_VERSION:0
  HOST_ID:c225b529-a368-4396-ae4d-4db230125b9c
  X1:CDC,CDC_GENERATIONS_V2,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_TABLES_V3,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UNBOUNDED_RANGE_TOMBSTONES,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
  RELEASE_VERSION:3.0.8
  X3:3
  X4:1
  X2:system_traces.sessions:0.000000;system_traces.node_slow_log:0.000000;system_distributed.service_levels:1.000000;system_distributed.view_build_status:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_attributes:0.000000;system_traces.events:0.000000;
  X7:12
  DC:datacenter1
  STATUS:NORMAL,5900492330596084389
  X8:v2;1661235250046;dd4b30e0-6f52-4442-baa9-0b1733a05c27
  RPC_ADDRESS:127.0.28.1
  X9:org.apache.cassandra.locator.SimpleSnitch
  SCHEMA:e80a7c3f-a0dc-3998-ac70-7ab147cbb7d0
  X5:0:53687091:1661235122573
/127.0.28.33
  generation:1661235258
  heartbeat:71
  X6:1
  RACK:rack1
  LOAD:1.04858e+06
  NET_VERSION:0
  HOST_ID:cb820aa9-671f-4eed-8a6a-7385c8ffe292
  X1:CDC,CDC_GENERATIONS_V2,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_TABLES_V3,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UNBOUNDED_RANGE_TOMBSTONES,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
  RELEASE_VERSION:3.0.8
  X3:3
  X4:1
  X2:system_traces.sessions:0.000000;system_traces.node_slow_log:0.000000;system_distributed.view_build_status:0.000000;system_distributed.service_levels:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;system_traces.node_slow_log_time_idx:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:1.000000;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_attributes:0.000000;system_traces.events:0.000000;
  X7:12
  DC:datacenter1
  STATUS:LEFT,951443624668706845,1661494471473270934
  X8:v2;1661235250046;dd4b30e0-6f52-4442-baa9-0b1733a05c27
  RPC_ADDRESS:127.0.28.33
  X9:org.apache.cassandra.locator.SimpleSnitch
  SCHEMA:e80a7c3f-a0dc-3998-ac70-7ab147cbb7d0
  X5:0:53687091:1661235271091
fruch commented 2 years ago

@asias I guess that if we wait like this:

node3.watch_log_for("FatClient .* has been silent for .*ms, removing from gossip")

problem is with scylladb-operator, that that user that wait for those, and it can react to different actions, but it need checks the gossip before doing any commands

asias commented 2 years ago

The 5cd97c964c3b2b4bb11cc2252ef576358447068c fails as below. It is weird nodetool gossip fails. I think it is not related to the bug though.

update_cluster_layout_tests.py:2155:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../scylla-ccm/ccmlib/scylla_node.py:684: in nodetool
    return super().nodetool(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <ccmlib.scylla_node.ScyllaNode object at 0x7f0f67675f30>, cmd = 'gossipinfo', capture_output = False, wait = True, timeout = None

    def nodetool(self, cmd, capture_output=True, wait=True, timeout=None):
        """
        Setting wait=False makes it impossible to detect errors,
        if capture_output is also False. wait=False allows us to return
        while nodetool is still running.
        When wait=True, timeout may be set to a number, in seconds,
        to limit how long the function will wait for nodetool to complete.
        """
        if capture_output and not wait:
            raise common.ArgumentError("Cannot set capture_output while wait is False.")
        env = self.get_env()
        if self.is_scylla() and not self.is_docker():
            host = self.address()
        else:
            host = 'localhost'
        nodetool = self.get_tool('nodetool')

        if not isinstance(nodetool, list):
            nodetool = [nodetool]
        # see https://www.oracle.com/java/technologies/javase/8u331-relnotes.html#JDK-8278972
        nodetool.extend(['-h', host, '-p', str(self.jmx_port), '-Dcom.sun.jndi.rmiURLParsing=legacy'])
        nodetool.extend(cmd.split())
        if capture_output:
            p = subprocess.Popen(nodetool, universal_newlines=True, env=env, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            stdout, stderr = p.communicate(timeout=timeout)
        else:
            p = subprocess.Popen(nodetool, env=env, universal_newlines=True)
            stdout, stderr = None, None

        if wait:
            exit_status = p.wait(timeout=timeout)
            if exit_status != 0:
>               raise NodetoolError(" ".join(nodetool), exit_status, stdout, stderr)
E               ccmlib.node.NodetoolError: Nodetool command '/home/asias/src/cloudius-systems/scylla/resources/cassandra/bin/nodetool -h 127.0.60.33 -p 7199 -Dcom.sun.jndi.rmiURLParsing=legacy gossipinfo' failed; exit status: 1

../scylla-ccm/ccmlib/node.py:795: NodetoolError
asias commented 2 years ago

@asias I guess that if we wait like this:

node3.watch_log_for("FatClient .* has been silent for .*ms, removing from gossip")

problem is with scylladb-operator, that that user that wait for those, and it can react to different actions, but it need checks the gossip before doing any commands

With the PR https://github.com/scylladb/scylladb/pull/11361. No need to wait for the removal of the old node from gossip.

fruch commented 2 years ago

@slivne FYI, an issue related to ip changes, and topology changes

avikivity commented 1 year ago

@asias what do you think about backporting this? is it safe, or should we wait for 5.2 soak time?

I guess it's safe since master and 5.2 have this for a long time.

denesb commented 10 months ago

All live versions have this, removing label.