Closed gdubicki closed 1 month ago
Please try to get logs from longer time period. These available in the must-gather dump, only contains last 2 hours. Logs from commands you executed and you had issues are not present.
Thanks @zimnx. That's 8mln rows of logs though and I can't export more than 100k from Datadog in one shot. Can you perhaps provide me what strings I should search for so we can limit the size of the export?
I have done an export of logs with <100k rows with the following query:
index:scylla image_name:"scylladb/scylla" -large_data -compaction -querier -query_processor -"seastar::rpc::closed_error" (raft OR BOOTSTRAP OR WARN OR ERROR) -snitch_logger -iotune -repair
I hope that this will be helpful at least for a start. Of course I can make more queries if needed.
@zimnx: Would performing a rolling restart of our cluster to apply some safe optimizations - only providing Scylla and Scylla Manager with more memory per node - be safe to do now (well, on Monday morning to be precise)?
I am also hoping that, if this is related to https://github.com/scylladb/scylladb/issues/19975, it might also resolve some or all of our current cluster scaling issues. Or that at least it won't do more harm here, while possibly improving the performance (which we will need when our peak hours traffic hits with only 5 nodes in the cluster).
I have done an export of logs with <100k rows with the following query:
index:scylla image_name:"scylladb/scylla" -large_data -compaction -querier -query_processor -"seastar::rpc::closed_error" (raft OR BOOTSTRAP OR WARN OR ERROR) -snitch_logger -iotune -repair
I hope that this will be helpful at least for a start. Of course I can make more queries if needed.
Filtering is not going to help, as there might be a log explaining a root cause of an issue that might not be picked up by the filter. I think it would be the best to collect logs +/- 10 minutes around time when commands were executed.
Please also attach recent output of nodetool status
and nodetool gossipinfo
Would performing a rolling restart of our cluster to apply some safe optimizations - only providing Scylla and Scylla Manager with more memory per node - be safe to do now (well, on Monday morning to be precise)?
I can't guarantee it would be safe as your cluster is in borked state. If you're running RF=3 and CL=QUORUM queries, then it should be safe. Just don't add more CPUs, as it would trigger a resharding we don't want at this point. Make sure to give each node after restart enough time to warm the cache before proceeding to the next one if you're worried about the traffic.
As of now:
root@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool status
Datacenter: us-west1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.7.241.130 1.56 TB 256 ? 787555a6-89d6-4b33-941c-940415380062 us-west1-b
UN 10.7.241.175 1.66 TB 256 ? 5342afaf-c19c-4be2-ada1-929698a4c398 us-west1-b
UN 10.7.241.174 1.48 TB 256 ? 813f49f9-e397-4d70-8300-79fa91817f11 us-west1-b
UN 10.7.249.238 1.51 TB 256 ? 5cc72b36-6fcf-4790-a540-930e544d59d2 us-west1-b
UN 10.7.243.109 1.43 TB 256 ? 880977bf-7cbb-4e0f-be82-ded853da57aa us-west1-b
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
root@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool gossipinfo
/10.7.241.175
generation:1721596585
heartbeat:2435731
NET_VERSION:0
RACK:us-west1-b
LOAD:1823919621838
STATUS:NORMAL,-8864119814958549968
DC:us-west1
RPC_ADDRESS:10.7.241.175
X4:1
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
HOST_ID:5342afaf-c19c-4be2-ada1-929698a4c398
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
X6:31
X7:12
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
X2:system_auth.roles:0.000000;system_traces.node_slow_log:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feed_truncations:0.000000;system_traces.sessions:0.000000;production.activities:0.792409;system_distributed.cdc_streams_descriptions_v2:0.000000;system_distributed.service_levels:1.000000;system_auth.role_members:0.000000;system_traces.events:0.000000;system_distributed.view_build_status:0.000000;production.feeds:0.893364;production.activities_v2:0.560822;system_traces.node_slow_log_time_idx:0.000000;test.heartrate_v1:0.000000;production.feed_counters:0.978368;system_traces.sessions_time_idx:0.000000;system_auth.role_attributes:0.000000;system_distributed.cdc_generation_timestamps:0.000000;
RELEASE_VERSION:3.0.8
X3:3
X5:0:342255206:1721669718699
/10.7.249.238
generation:1722496308
heartbeat:1249240
NET_VERSION:0
RACK:us-west1-b
LOAD:1664549588379
STATUS:NORMAL,-9167251459053092449
DC:us-west1
RPC_ADDRESS:10.7.249.238
X4:1
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
HOST_ID:5cc72b36-6fcf-4790-a540-930e544d59d2
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
X6:31
X7:12
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
X2:production.feed_truncations:0.000000;system_traces.sessions:0.000000;test.heartrate_v1:0.000000;production.feed_counters:0.976706;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;production.activities:0.786895;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feeds:0.897525;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.651030;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:0.000000;system_traces.node_slow_log:0.000000;
RELEASE_VERSION:3.0.8
X3:3
X5:0:342255206:1722544478221
/10.7.243.109
generation:1723033937
heartbeat:518066
NET_VERSION:0
RACK:us-west1-b
LOAD:1574210785812
STATUS:NORMAL,-8564353228911561110
DC:us-west1
RPC_ADDRESS:10.7.243.109
X4:1
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
HOST_ID:880977bf-7cbb-4e0f-be82-ded853da57aa
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
X6:31
X7:12
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
X2:production.feed_truncations:0.000000;system_traces.sessions:0.000000;test.heartrate_v1:0.000000;production.feed_counters:0.977903;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;production.activities:0.796932;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feeds:0.895235;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.586408;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:0.000000;system_traces.node_slow_log:0.000000;
RELEASE_VERSION:3.0.8
X3:3
X5:0:342255206:1723103256746
/10.7.241.130
generation:1721671188
heartbeat:2373773
NET_VERSION:0
RACK:us-west1-b
LOAD:1717705044311
STATUS:NORMAL,-97971195482211408
DC:us-west1
RPC_ADDRESS:10.7.241.130
X4:1
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
HOST_ID:787555a6-89d6-4b33-941c-940415380062
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
X6:31
X7:12
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
X2:system_auth.roles:0.000000;system_traces.node_slow_log:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feed_truncations:0.000000;system_traces.sessions:0.000000;production.activities:0.787918;system_distributed.cdc_streams_descriptions_v2:0.000000;system_distributed.service_levels:1.000000;system_auth.role_members:0.000000;system_traces.events:0.000000;system_distributed.view_build_status:0.000000;production.feeds:0.896144;production.activities_v2:0.429126;system_traces.node_slow_log_time_idx:0.000000;test.heartrate_v1:0.000000;production.feed_counters:0.974609;system_traces.sessions_time_idx:0.000000;system_auth.role_attributes:0.000000;system_distributed.cdc_generation_timestamps:0.000000;
RELEASE_VERSION:3.0.8
X3:3
X5:0:342255206:1721720096847
/10.7.241.174
generation:1722848828
heartbeat:757739
NET_VERSION:0
RACK:us-west1-b
LOAD:1628117103437
STATUS:NORMAL,1116083325320868400
DC:us-west1
RPC_ADDRESS:10.7.241.174
X4:1
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
HOST_ID:813f49f9-e397-4d70-8300-79fa91817f11
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
X6:31
X7:12
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
X2:production.feed_truncations:0.000000;system_traces.sessions:0.000000;test.heartrate_v1:0.000000;production.feed_counters:0.977739;system_distributed.cdc_generation_timestamps:0.000000;system_auth.role_members:0.000000;system_distributed.cdc_streams_descriptions_v2:0.000000;production.activities:0.797675;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feeds:0.894565;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.593607;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.roles:0.000000;system_traces.node_slow_log:0.000000;
RELEASE_VERSION:3.0.8
X3:3
X5:0:342255206:1722923560164
Would performing a rolling restart of our cluster to apply some safe optimizations - only providing Scylla and Scylla Manager with more memory per node - be safe to do now (well, on Monday morning to be precise)?
I can't guarantee it would be safe as your cluster is in borked state. If you're running RF=3 and CL=QUORUM queries, then it should be safe. Just don't add more CPUs, as it would trigger a resharding we don't want at this point. Make sure to give each node after restart enough time to warm the cache before proceeding to the next one if you're worried about the traffic.
I did it and it worked, all the nodes restarted. As expected, it needed manual delete to recreate pods 1
and 2
as the automatic rollout by STS was stopped on pod 3
that is not going up. What I didn't expect is that after applying a change the ScyllaCluster object was updated with the new memory request and limit, but the StatefulSet was not. I am afraid that they got out of sync when we did the hack with deleting and recreating the STS. :/ Ultimately we updated the STS manually to create pods with more memory.
Here's an updated output of nodetool gossipinfo
after the rolling restart:
root@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool gossipinfo
/10.7.249.238
generation:1723463402
heartbeat:595
RPC_ADDRESS:10.7.249.238
X7:12
X3:3
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
STATUS:NORMAL,9213597656330103393
X2:system_distributed.cdc_generation_timestamps:0.000000;production.feed_truncations:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log:0.000000;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.253676;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feeds:0.868316;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.sessions:0.000000;production.feed_counters:0.924761;test.heartrate_v1:0.000000;production.activities:0.737744;system_distributed.cdc_streams_descriptions_v2:0.000000;system_auth.role_members:0.000000;
X4:1
LOAD:1670341483567
X5:0:756442726:1723463415124
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
RELEASE_VERSION:3.0.8
NET_VERSION:0
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
RACK:us-west1-b
X6:31
HOST_ID:5cc72b36-6fcf-4790-a540-930e544d59d2
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
DC:us-west1
/10.7.241.175
generation:1723463490
heartbeat:494
RPC_ADDRESS:10.7.241.175
X7:12
X3:3
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
STATUS:NORMAL,9191245930443787145
X2:system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;system_traces.sessions:0.000000;system_distributed.cdc_generation_timestamps:0.975906;test.heartrate_v1:0.000000;production.feed_counters:0.919281;production.feed_truncations:0.000000;system_distributed.service_levels:1.000000;production.activities:0.710310;system_distributed.cdc_streams_descriptions_v2:0.000000;system_traces.events:0.000000;system_distributed.view_build_status:0.000000;production.feeds:0.847015;production.activities_v2:0.203567;system_traces.node_slow_log_time_idx:0.000000;system_auth.role_members:0.000000;system_traces.sessions_time_idx:0.000000;system_auth.role_attributes:0.000000;system_traces.node_slow_log:0.000000;system_auth.roles:1.000000;
X4:1
LOAD:1827660219161
X5:0:756442726:1723463502683
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
RELEASE_VERSION:3.0.8
NET_VERSION:0
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
RACK:us-west1-b
X6:31
HOST_ID:5342afaf-c19c-4be2-ada1-929698a4c398
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
DC:us-west1
/10.7.243.109
generation:1723463698
heartbeat:235
RPC_ADDRESS:10.7.243.109
X7:12
X3:3
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
STATUS:NORMAL,8942959036279233951
X2:system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;production.feed_truncations:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log:0.000000;system_auth.role_members:0.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.107328;system_traces.events:0.000000;system_traces.sessions:0.000000;system_distributed.cdc_generation_timestamps:0.000000;production.feed_counters:0.777977;test.heartrate_v1:0.000000;production.feeds:0.542748;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.activities:0.432635;system_distributed.cdc_streams_descriptions_v2:0.000000;
X4:1
LOAD:1576482887375
X5:0:756442726:1723463710868
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
RELEASE_VERSION:3.0.8
NET_VERSION:0
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
RACK:us-west1-b
X6:31
HOST_ID:880977bf-7cbb-4e0f-be82-ded853da57aa
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
DC:us-west1
/10.7.241.174
generation:1723463611
heartbeat:346
RPC_ADDRESS:10.7.241.174
X7:12
X3:3
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
STATUS:NORMAL,996896219089399285
X2:system_distributed.cdc_generation_timestamps:0.000000;production.feed_truncations:0.000000;system_auth.roles:1.000000;system_traces.node_slow_log:0.000000;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.142687;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.791391;production.feeds:0.753341;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.sessions:0.000000;production.feed_counters:0.862901;test.heartrate_v1:0.000000;production.activities:0.617211;system_distributed.cdc_streams_descriptions_v2:0.000000;system_auth.role_members:0.000000;
X4:1
LOAD:1631596291649
X5:0:756442726:1723463624075
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
RELEASE_VERSION:3.0.8
NET_VERSION:0
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
RACK:us-west1-b
X6:31
HOST_ID:813f49f9-e397-4d70-8300-79fa91817f11
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
DC:us-west1
/10.7.241.130
generation:1723463283
heartbeat:746
RPC_ADDRESS:10.7.241.130
X7:12
X3:3
X8:v2;1723193419331;a0d4c151-c4d9-4ada-a801-c39a82eb9602
STATUS:NORMAL,8992710247996941333
X2:system_distributed.cdc_generation_timestamps:0.999277;production.feed_truncations:0.000000;system_auth.roles:0.967668;system_traces.node_slow_log:0.000000;system_auth.role_attributes:0.000000;system_traces.sessions_time_idx:0.000000;system_traces.node_slow_log_time_idx:0.000000;production.activities_v2:0.153191;system_traces.events:0.000000;system_distributed_everywhere.cdc_generation_descriptions_v2:0.000000;production.feeds:0.893740;system_distributed.view_build_status:0.000000;system_distributed.service_levels:1.000000;system_traces.sessions:0.000000;production.feed_counters:0.942885;test.heartrate_v1:0.000000;production.activities:0.768185;system_distributed.cdc_streams_descriptions_v2:0.000000;system_auth.role_members:0.000000;
X4:1
LOAD:1721391700392
X5:0:756442726:1723463295645
X9:org.apache.cassandra.locator.GossipingPropertyFileSnitch
RELEASE_VERSION:3.0.8
NET_VERSION:0
X1:AGGREGATE_STORAGE_OPTIONS,ALTERNATOR_TTL,CDC,CDC_GENERATIONS_V2,COLLECTION_INDEXING,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,EMPTY_REPLICA_MUTATION_PAGES,EMPTY_REPLICA_PAGES,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_COLLECTION_DETECTION,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_COMMITLOG,SCHEMA_TABLES_V3,SECONDARY_INDEXES_ON_STATIC_COLUMNS,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TABLE_DIGEST_INSENSITIVE_TO_EXPIRY,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,TYPED_ERRORS_IN_READ_RPC,UDA,UDA_NATIVE_PARALLELIZED_AGGREGATION,UNBOUNDED_RANGE_TOMBSTONES,UUID_SSTABLE_IDENTIFIERS,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
RACK:us-west1-b
X6:31
HOST_ID:787555a6-89d6-4b33-941c-940415380062
SCHEMA:d2474336-9cf3-3195-9445-8c250dcadb2f
DC:us-west1
I don't know if something significant changed here, so I am just pasting it as it was returned. (The nodetool status
output didn't change, except for minimal changes in the load values.)
I attach all the logs from all the nodes from the period: Aug 8, 6:55 pm CEST – Aug 8, 7:20 pm CEST.
I attach all the logs from all the nodes from the period: Aug 8, 8:13 pm CEST – Aug 8, 8:24 pm CEST
I attach all the logs from all the nodes from the period: Aug 8, 8:31 pm CEST – Aug 8, 9:15 pm CEST
The time periods above were chosen because some things have started or stopped happening at this time and to keep below the 100k limit. This is a visualization of the amount of logs during that whole time:
Please let me know if there's anything that stands out, @zimnx. I will share more logs tomorrow.
I attach all the logs from all the nodes from the period: Aug 8, 9:15 pm CEST – Aug 8, 9:31 pm CEST
I attach all the logs from all the nodes from the period: Aug 8, 9:31 pm CEST – Aug 8, 9:50 pm CEST
I attach all the logs from all the nodes from the period: Aug 8, 9:50 pm CEST – Aug 8, 10:05 pm CEST
If you prefer I can merge these files into one. Let me know if I can help in any other way!
@gdubicki do attempts to remove the 2 ghost nodes still give the "Operation in progress" error after rolling restart?
I successfully removed the first one! 🥳
root@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool removenode aa434e8c-84a1-43a3-a398-28e4e6949e56
root@gke-main-scylla-6-25fcbc5b-1mnq:/# cqlsh
Connected to scylla at 0.0.0.0:9042
[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select server_id,group_id from system.raft_state ;
server_id | group_id
--------------------------------------+--------------------------------------
5342afaf-c19c-4be2-ada1-929698a4c398 | 904c8960-2c68-11ee-979c-be9922839fd2
5cc72b36-6fcf-4790-a540-930e544d59d2 | 904c8960-2c68-11ee-979c-be9922839fd2
787555a6-89d6-4b33-941c-940415380062 | 904c8960-2c68-11ee-979c-be9922839fd2
813f49f9-e397-4d70-8300-79fa91817f11 | 904c8960-2c68-11ee-979c-be9922839fd2
880977bf-7cbb-4e0f-be82-ded853da57aa | 904c8960-2c68-11ee-979c-be9922839fd2
ceb76652-0f39-4018-9fa5-dd8f0b25e85a | 904c8960-2c68-11ee-979c-be9922839fd2
(6 rows)
Will try the other one in a few minutes.
Update: the second one got removed correctly too. :)
Ok, after you remove the second one, you can try booting new node again, just make sure you purge the old directories (in /var/lib/scylla, data
, commitlog
etc.) on the new node first, if you are reusing the same machine you used in previous boot attempt.
No, we will be starting with a new machine, with empty disk.
But how do you provision a new node with Scylla Operator? Should be remove the label scylla/replace=""
from the appropriate Service?
But how do you provision a new node with Scylla Operator? Should be remove the label scylla/replace="" from the appropriate Service?
@scylladb/rnd-cloud-operator please advise
Make sure StatefulSet/ScyllaCluster replicas reflects your existing state of the cluster. Clear all replace labels from Services. Bring back OrderedReady
podManagementPolicy
. Then bump number of members in ScyllaCluster to expected value. It will create new pods via statefulset.
Thanks a lot! We will do it tomorrow morning a it's a bit of hacking and it's safer to do it outside of our peak hours, which start now.
Ok, I started to do this today and things got a bit strange.
Make sure StatefulSet/ScyllaCluster replicas reflects your existing state of the cluster.
I actually didn't notice this part so I didn't do this. :|
But I would be afraid to change ScyllaCluster from 7 nodes to 5 now as wouldn't that remove the last two services along with the their pods 5
and 6
? That would give us 2 nodes down out of 5 with RF=3 so we would be 1 node away from losing data and the cluster performance would not hold our peak hours. :/
Clear all replace labels from Services. Bring back OrderedReady podManagementPolicy.
I did this.
But a few minutes later I noticed that while Service 0
has remained without the labels, Service 3
has restored the labels and they look like this now:
apiVersion: v1
kind: Service
metadata:
annotations:
internal.scylla-operator.scylladb.com/current-token-ring-hash: qLFKP9ngpFWPAL0uyeS9L9UdydMoPcJqYY4vMLkJTAVpxCdHO0iN113JaZHXXC2aJM
oEoOuWBogBdKo+sgVvoQ==
internal.scylla-operator.scylladb.com/host-id: c5214c14-6fb6-4ade-b5c9-01bf9f5b2029
internal.scylla-operator.scylladb.com/last-cleaned-up-token-ring-hash: qLFKP9ngpFWPAL0uyeS9L9UdydMoPcJqYY4vMLkJTAVpxCdHO0iN113JaZ
HXXC2aJMoEoOuWBogBdKo+sgVvoQ==
meta.helm.sh/release-name: scylla
meta.helm.sh/release-namespace: scylla
scylla-operator.scylladb.com/managed-hash: faqxjG8nRXLfj9++wiv/LxhVTYY7U8B28B78f8/UTD6cFqAvLSR/bMLnt/m2guunOinrulwyh2c3RmQ8jXO4Ww
==
creationTimestamp: "2023-11-29T13:46:14Z"
labels:
app: scylla
app.kubernetes.io/managed-by: scylla-operator
app.kubernetes.io/name: scylla
internal.scylla-operator.scylladb.com/replacing-node-hostid: c5214c14-6fb6-4ade-b5c9-01bf9f5b2029
scylla-operator.scylladb.com/scylla-service-type: member
scylla/cluster: scylla
scylla/datacenter: us-west1
scylla/rack: us-west1-b
scylla/replace: ""
name: scylla-us-west1-us-west1-b-3
namespace: scylla
At the same time the pod for this service (3
) has disappeared. (We still have pod 0
in the Pending state.)
I attach logs from Scyllas from the last 30 minutes extract-2024-08-15T08_50_35.820Z.csv.zip and from Scylla Manager, Operator, etc. from the same time period extract-2024-08-15T08_48_23.576Z.csv.zip.
Plus some outputs:
root@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool status
Datacenter: us-west1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.7.241.130 1.85 TB 256 ? 787555a6-89d6-4b33-941c-940415380062 us-west1-b
UN 10.7.241.175 2 TB 256 ? 5342afaf-c19c-4be2-ada1-929698a4c398 us-west1-b
UN 10.7.241.174 1.79 TB 256 ? 813f49f9-e397-4d70-8300-79fa91817f11 us-west1-b
UN 10.7.249.238 1.91 TB 256 ? 5cc72b36-6fcf-4790-a540-930e544d59d2 us-west1-b
UN 10.7.243.109 1.72 TB 256 ? 880977bf-7cbb-4e0f-be82-ded853da57aa us-west1-b
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
root@gke-main-scylla-6-25fcbc5b-1mnq:/# cqlsh
Connected to scylla at 0.0.0.0:9042
[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select server_id,group_id from system.raft_state ;
server_id | group_id
--------------------------------------+--------------------------------------
5342afaf-c19c-4be2-ada1-929698a4c398 | 904c8960-2c68-11ee-979c-be9922839fd2
5cc72b36-6fcf-4790-a540-930e544d59d2 | 904c8960-2c68-11ee-979c-be9922839fd2
787555a6-89d6-4b33-941c-940415380062 | 904c8960-2c68-11ee-979c-be9922839fd2
813f49f9-e397-4d70-8300-79fa91817f11 | 904c8960-2c68-11ee-979c-be9922839fd2
880977bf-7cbb-4e0f-be82-ded853da57aa | 904c8960-2c68-11ee-979c-be9922839fd2
(5 rows)
cqlsh> select host_id, up from system.cluster_status;
host_id | up
--------------------------------------+------
787555a6-89d6-4b33-941c-940415380062 | True
5342afaf-c19c-4be2-ada1-929698a4c398 | True
813f49f9-e397-4d70-8300-79fa91817f11 | True
5cc72b36-6fcf-4790-a540-930e544d59d2 | True
880977bf-7cbb-4e0f-be82-ded853da57aa | True
(5 rows)
...although there's nothing new here.
I don't know if it's related, but I just noticed that our Scylla Manager is not working, see https://github.com/scylladb/scylla-manager/issues/3972
Do you think we can still try to add one more node to our cluster by adding a node to our node pool and seeing it it gets added to the cluster or do you consider it unsafe in the current state, @zimnx?
Please collect new must-gather and attach it here as I don't understand in what state ScyllaCluster is nor Pods are.
We added the node before we noticed your reply 🤦♂️ ...but it looks like it's bootstrapping correctly! 🥳
It has started on node gke-main-scylla-6-25fcbc5b-bq2w
and this looks ok-ish, I guess:
noderoot@gke-main-scylla-6-25fcbc5b-1mnq:/# nodetool status
Datacenter: us-west1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UJ 10.7.252.229 ? 256 ? null us-west1-b
UN 10.7.241.130 1.87 TB 256 ? 787555a6-89d6-4b33-941c-940415380062 us-west1-b
UN 10.7.241.175 1.95 TB 256 ? 5342afaf-c19c-4be2-ada1-929698a4c398 us-west1-b
UN 10.7.241.174 1.77 TB 256 ? 813f49f9-e397-4d70-8300-79fa91817f11 us-west1-b
UN 10.7.249.238 1.83 TB 256 ? 5cc72b36-6fcf-4790-a540-930e544d59d2 us-west1-b
UN 10.7.243.109 1.74 TB 256 ? 880977bf-7cbb-4e0f-be82-ded853da57aa us-west1-b
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
root@gke-main-scylla-6-25fcbc5b-1mnq:/# cqlsh
Connected to scylla at 0.0.0.0:9042
[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select host_id, up from system.cluster_status;
host_id | up
--------------------------------------+------
00000000-0000-0000-0000-000000000000 | True
787555a6-89d6-4b33-941c-940415380062 | True
5342afaf-c19c-4be2-ada1-929698a4c398 | True
813f49f9-e397-4d70-8300-79fa91817f11 | True
5cc72b36-6fcf-4790-a540-930e544d59d2 | True
880977bf-7cbb-4e0f-be82-ded853da57aa | True
(6 rows)
cqlsh> select server_id,group_id from system.raft_state ;
server_id | group_id
--------------------------------------+--------------------------------------
5342afaf-c19c-4be2-ada1-929698a4c398 | 904c8960-2c68-11ee-979c-be9922839fd2
5cc72b36-6fcf-4790-a540-930e544d59d2 | 904c8960-2c68-11ee-979c-be9922839fd2
60daa392-6362-423d-93b2-1ff747903287 | 904c8960-2c68-11ee-979c-be9922839fd2
787555a6-89d6-4b33-941c-940415380062 | 904c8960-2c68-11ee-979c-be9922839fd2
813f49f9-e397-4d70-8300-79fa91817f11 | 904c8960-2c68-11ee-979c-be9922839fd2
880977bf-7cbb-4e0f-be82-ded853da57aa | 904c8960-2c68-11ee-979c-be9922839fd2
(6 rows)
I attach the Scylla logs from last 30 minutes.
extract-2024-08-15T15_21_46.921Z.csv.zip
Please collect new must-gather and attach it here as I don't understand in what state ScyllaCluster is nor Pods are.
Sure, will do in a few minutes.
scylla-operator-must-gather-hlgph85ggm86.zip
Thank you for all the help, @zimnx! 🤗
So your 6th node is joining (0 ordinal), and your ScyllaCluster has 7 desired replicas. Pod with 3 ordinal is missing and has a replace label.
To bring back cluster back to expected state, I would suggest to remove both replace labels (scylla/replace: ""
and internal.scylla-operator.scylladb.com/replacing-node-hostid: c5214c14-6fb6-4ade-b5c9-01bf9f5b2029
) from Service scylla-us-west1-us-west1-b-3
. Since you removed ghost nodes replace is no longer needed.
Once -0
node joins the cluster, StatefulSet controller will recreate -3
Pod and it will start joining the cluster.
Once last 7th node joins your cluster should be fully reconciled, and only then you may carry next topology changes if you still desire.
Thank you again for all your help, @zimnx!
We managed to get our cluster into a healthy state with all 7 nodes up:
root@gke-main-scylla-6-25fcbc5b-bq2w:/# nodetool status
Datacenter: us-west1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.7.252.229 2.45 TB 256 ? 60daa392-6362-423d-93b2-1ff747903287 us-west1-b
UN 10.7.241.130 2.67 TB 256 ? 787555a6-89d6-4b33-941c-940415380062 us-west1-b
UN 10.7.241.175 2.94 TB 256 ? 5342afaf-c19c-4be2-ada1-929698a4c398 us-west1-b
UN 10.7.241.174 2.6 TB 256 ? 813f49f9-e397-4d70-8300-79fa91817f11 us-west1-b
UN 10.7.249.238 2.59 TB 256 ? 5cc72b36-6fcf-4790-a540-930e544d59d2 us-west1-b
UN 10.7.243.109 2.66 TB 256 ? 880977bf-7cbb-4e0f-be82-ded853da57aa us-west1-b
UN 10.7.248.124 2.06 TB 256 ? dea17e3f-198a-4ab8-b246-ff29e103941a us-west1-b
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
root@gke-main-scylla-6-25fcbc5b-bq2w:/# cqlsh
Connected to scylla at 0.0.0.0:9042
[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select server_id,group_id from system.raft_state ;
server_id | group_id
--------------------------------------+--------------------------------------
5342afaf-c19c-4be2-ada1-929698a4c398 | 904c8960-2c68-11ee-979c-be9922839fd2
5cc72b36-6fcf-4790-a540-930e544d59d2 | 904c8960-2c68-11ee-979c-be9922839fd2
60daa392-6362-423d-93b2-1ff747903287 | 904c8960-2c68-11ee-979c-be9922839fd2
787555a6-89d6-4b33-941c-940415380062 | 904c8960-2c68-11ee-979c-be9922839fd2
813f49f9-e397-4d70-8300-79fa91817f11 | 904c8960-2c68-11ee-979c-be9922839fd2
880977bf-7cbb-4e0f-be82-ded853da57aa | 904c8960-2c68-11ee-979c-be9922839fd2
dea17e3f-198a-4ab8-b246-ff29e103941a | 904c8960-2c68-11ee-979c-be9922839fd2
(7 rows)
cqlsh> select host_id, up from system.cluster_status;
host_id | up
--------------------------------------+------
60daa392-6362-423d-93b2-1ff747903287 | True
787555a6-89d6-4b33-941c-940415380062 | True
dea17e3f-198a-4ab8-b246-ff29e103941a | True
5342afaf-c19c-4be2-ada1-929698a4c398 | True
813f49f9-e397-4d70-8300-79fa91817f11 | True
5cc72b36-6fcf-4790-a540-930e544d59d2 | True
880977bf-7cbb-4e0f-be82-ded853da57aa | True
(7 rows)
cqlsh>
What happened?
We originally had 7 nodes in our Scylla cluster, n2d-standard-32 with 3TB local SSDs, running Scylla 5.2.9, Scylla Operator 1.9.x, Scylla Manager 3.1.x.
We updated Scylla to 5.4.7, Scylla Operator 1.13.0 and Scylla Manager to 3.3.0 on the 25th of June. (Note that since that update we noticed https://github.com/scylladb/scylladb/issues/19793)
Then from June 25th to July 4th we migrated our Scylla from to a new node pool, with same machine size but a little different config - switching from the default SA to a custom one, used recommended oauth scopes. (Nothing that would affect Scylla directly, I think.)
On 12th of July we updated Scylla it to 5.4.9.
On July 19th there was a hardware issue on one GCP node with Scylla that caused it to be restarted and the local SSD contents were lost. This was pod
3
in the StatefulSet. Thenodetool status
after it happened was:...but when I did the node replace procedure it failed on:
(Note that the id from this message was not part of the cluster and some time later when did this procedure to remove the ghosts nodes from our cluster then that id was also NOT appearing anywhere then.)
Anyway, we decided to remove the node
15253b54-8f30-4583-b08e-469c10c58aa2
using procedure https://opensource.docs.scylladb.com/branch-5.4/operating-scylla/procedures/cluster-management/remove-node.html#removing-an-unavailable-node and then bootstrap the new one as a completely new node.Node removal took ~13 hours but it succeeded.
But we didn't succeed with bootstrapping a new node. We guessed that it's because that although the
nodetool status
has shown 6 nodes then in the cluster, the ScyllaCluster object still had 7 of them:(Sorry for the screenshots with texts, that's only way some of the states were preserved.)
Then we tried to replace node
3
with node6
to then scale down the cluster "officially" by updating the values for ScyllaCluster to make it a 6 node cluster (we wanted to do that anyway, assuming that with our RF=3 this is a better way to have a more balancer cluster).But then most probably we hit an issue with PV / PVC from the old GCP node being left attached to the pod
3
and probably because of that we weren't able to start the replace procedure. :( (I think that this was why it happened because only yesterday I thought about this and today after did a cleanup of those PV / PVC I was able to make pod3
schedule a new Scylla pod again.)Ultimately we just left it as it was then but the next day we hit another issue: because of https://github.com/scylladb/scylladb/issues/19793 hit the limit of our 3TB local disk space and was killed. (Back then we didn't yet know that we hit the 90% limit, not 100%, because we didn't yet have the workaround from https://github.com/scylladb/scylla-operator/issues/2056).
This happened to pod
4
and we were afraid to try to restart it because we assumed that StatefulSet will not allow it to start until pod3
is running.So we had pod
3
unready and pod4
not starting. Ultimately we hacked around StatefulSet to make it work - we created a copy of it, deleted it and oprhaning the pods, recreated it withpodManagementPolicy: Parallel
. The hack worked (we did this once in the past before to work around the STS limitation successfully), we managed to start pod4
but it fell in a restart loop because it was getting its disk full quickly.That was July 22nd already. We decided to start migrating our Scylla cluster to a new nodepool, with n2d-highmem-32 and 6TB disks - more memory to tune Scylla later to more recommended CPU to memory ratio and bigger disks to have more time to fix https://github.com/scylladb/scylladb/issues/19793.
The replace node worked this time and pod
4
was successfully moved to the new node pool.In the meantime we did a cleanup of the ghosts nodes in our cluster, but we noticed some strangeness during it so we reported https://github.com/scylladb/scylladb/issues/20020. Our node ids and the ghosts are in that issue.
Anyway, we continued to migrate the remaining Scylla pods to the new node pool and it all worked well, we migrated all of them except the pod
0
.Note that pod
3
was still out, we didn't fix that yet. We planned to look into it more after we complete the migration to the new node pool as the disk space was slowly running out on the nodes with 3TB disks.So yesterday we started to migrate pod
0
according to the procedure but during the bootstrap, a few minutes after it started, we accidentally triggered a delete of its PVC. :| It was shown as terminating, so we were worried that we will wait for ~12-30 hours (this is how long it took for us recently) for the bootstrap only to get it terminated and all of that progress lost, so we manually deleted the pod, deleted the PVC and PV and resumed.But then the new Scylla didn't restart the bootstrap as expected.
Unfortunately because of the stress and of the conditions we worked in I don't have the exact info right now what we did and in what order. :(
I do have the logs from all Scyllas from that period, so I can query for things you want to know.
What I know is that:
nodetool status
with a null id, like in https://github.com/scylladb/scylladb/issues/199751ff1b8df-7a90-4321-a309-7cd69e20bd70
) as well as the ids of the nodes that were trying to replace it.0
and3
. We tried removing the "internal" label there with the id of the node that is being replaces (hoping that the right one will be set on the next try).So ultimately now we are left with both pod
3
and pod0
not working and a cluster with this state:Now we can't bootstrap the missing pods
0
and3
as new nodes (as I guess this is the only option now, if the are no nodes to replace).We are getting seemingly good info in the logs at first like:
...but then we get a bunch of:
...and it ends with repeating:
When we try to remove any (ghost) nodes now, we can't:
...but if we check what node is still being deleted, we see:
Please let us know if you need any more info! Fixing this is the top priority for us.
What did you expect to happen?
Being now able to bootstrap the 2 missing nodes so we can restore our cluster to a fully healthy state.
How can we reproduce it (as minimally and precisely as possible)?
It's really hard for me to say which steps of the above history were the main contributors to our current situation...
Scylla Operator version
1.13.0
Kubernetes platform name and version
(Note: this is the current version. When the whole story started we probably had a little older version.)
Please attach the must-gather archive.
scylla-operator-must-gather-dn7mtkvkjwvg.zip
Anything else we need to know?
This was initiated in this Slack thread