scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
58 stars 95 forks source link

TerminateAndRemoveNodeMonkey nemesis is failing on Docker backend when trying to repair a node after disruption #7331

Open dimakr opened 7 months ago

dimakr commented 7 months ago

TerminateAndRemoveNodeMonkey nemesis case is failing on attempts to repair a node after the disruption, with the error:

04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > sdcm.nemesis.TerminateAndRemoveNodeMonkey: failed to execute repair command on node Node longevity-1gb-1h-nemesis-longevit-db-node-d79554af-0 [172.17.0.2 | 172.17.0.2] (seed: True) due to the following error: Encountered a bad command exit code!
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > Command: '/usr/bin/nodetool  repair '
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > Exit code: 2
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > Stdout:
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > [2024-04-09 02:19:27,654] Starting repair command #2, repairing 1 ranges for keyspace scylla_bench (parallelism=SEQUENTIAL, full=true)
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > [2024-04-09 02:20:52,760] Repair session 2 failed
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > [2024-04-09 02:20:52,761] Repair session 2 finished
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > Stderr:
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > 
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > -- StackTrace --
04:20:59  < t:2024-04-09 02:20:58,809 f:nemesis.py      l:3501 c:sdcm.nemesis         p:ERROR > java.lang.RuntimeException: Repair job has failed with the error message: [2024-04-09 02:20:52,760] Repair session 2 failed

Installation details

SCT Version: master Scylla version: 2024.1.2-0.20240228.2c85a811d0be Test: longevity-5gb-1h-nemesis Test config: configurations/nemesis/additional_configs/docker_backend_local.yaml

Logs

TerminateAndRemoveNodeMonkey Jenkins job url

soyacz commented 7 months ago

@dimakr can you look into db nodes to get the db errors and list here error lines?

dimakr commented 7 months ago

@soyacz Find please below errors in db-nodes system.log in sct-results:

❯ find ~/sct-results/latest/longevity-1gb-1h-nemesis-dmitriy-db-cluster-687b0d24/ -type f -exec grep -Eir "error|fail" {} \; -exec echo "===========" \;

ts=2024-04-12T09:56:19.994Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 09:56:49,681 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 09:56:50,147 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 09:56:50,147 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 09:56:50,243 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.2 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {}
INFO  2024-04-12 09:56:50,250 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 09:57:02,348 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
WARN  2024-04-12 10:06:06,238 [shard 0:stre] gossip - failure_detector_loop: Got error in the loop, live_nodes={172.17.0.4, 172.17.0.3}: seastar::sleep_aborted (Sleep is aborted)
INFO  2024-04-12 10:06:06,238 [shard 0:stre] gossip - failure_detector_loop: Finished main loop
WARN  2024-04-12 10:06:06,238 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
WARN  2024-04-12 10:06:06,239 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
WARN  2024-04-12 10:06:06,463 [shard 0:goss] gossip - === Gossip round FAIL: seastar::gate_closed_exception (gate closed)
INFO  2024-04-12 10:06:08,986 [shard 0:main] init - Shutting down direct_failure_detector
INFO  2024-04-12 10:06:08,986 [shard 0:main] init - Shutting down direct_failure_detector was successful
ts=2024-04-12T10:06:51.146Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 10:07:08,449 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 10:07:08,774 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 10:07:08,774 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 10:07:08,831 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.2 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {}
INFO  2024-04-12 10:07:08,834 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 10:07:20,920 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
===========
ts=2024-04-12T09:56:27.752Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 09:59:14,473 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 09:59:14,823 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 09:59:14,823 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 09:59:14,930 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.4 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2024-04-12 09:59:14,932 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 09:59:28,464 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
WARN  2024-04-12 10:06:27,872 [shard 0:stre] repair - repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_members, role_attributes}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown
WARN  2024-04-12 10:06:27,872 [shard 0:stre] repair - repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: user-requested repair failed: std::runtime_error ({shard 0: std::runtime_error (repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_members, role_attributes}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown)})
WARN  2024-04-12 10:46:39,416 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
WARN  2024-04-12 10:46:39,416 [shard 0:stre] gossip - failure_detector_loop: Got error in the loop, live_nodes={172.17.0.3}: seastar::sleep_aborted (Sleep is aborted)
INFO  2024-04-12 10:46:39,416 [shard 0:stre] gossip - failure_detector_loop: Finished main loop
WARN  2024-04-12 10:46:39,426 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
WARN  2024-04-12 10:46:40,327 [shard 0:goss] gossip - === Gossip round FAIL: seastar::gate_closed_exception (gate closed)
WARN  2024-04-12 10:46:41,321 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
INFO  2024-04-12 10:46:41,866 [shard 0:main] init - Shutting down direct_failure_detector
INFO  2024-04-12 10:46:41,866 [shard 0:main] init - Shutting down direct_failure_detector was successful
===========
ts=2024-04-12T09:56:23.634Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 09:58:02,187 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 09:58:02,680 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 09:58:02,680 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 09:58:02,749 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.3 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2024-04-12 09:58:02,751 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 09:58:16,434 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
WARN  2024-04-12 10:06:16,457 [shard 0:stre] repair - repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_attributes, role_members}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown
WARN  2024-04-12 10:06:16,458 [shard 0:stre] repair - repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: user-requested repair failed: std::runtime_error ({shard 0: std::runtime_error (repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_attributes, role_members}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown)})
===========

Also some grep through logs of db-node docker containers themselves:

❯ docker ps --format 'table {{.ID}}\t{{.Image}}\t{{.CreatedAt}}\t{{.Names}}' | grep scylla-db

31d07e17b832   scylla-sct:scylla-db-687b0d24                 2024-04-12 12:06:49 +0200 CEST   longevity-1gb-1h-nemesis-dmitriy-db-node-687b0d24-0
2ce4ac7b01d7   scylla-sct:scylla-db-687b0d24                 2024-04-12 11:56:26 +0200 CEST   longevity-1gb-1h-nemesis-dmitriy-db-node-687b0d24-2
490b929ca57f   scylla-sct:scylla-db-687b0d24                 2024-04-12 11:56:21 +0200 CEST   longevity-1gb-1h-nemesis-dmitriy-db-node-687b0d24-1
❯ docker logs 31d07e17b832 2>&1 | grep -iE 'error|fail|timed|except'
ts=2024-04-12T10:06:51.146Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 10:07:08,449 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 10:07:08,774 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 10:07:08,774 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 10:07:08,831 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.2 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {}
INFO  2024-04-12 10:07:08,834 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 10:07:20,920 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
❯ docker logs 2ce4ac7b01d7 2>&1 | grep -iE 'error|fail|timed|except'
ts=2024-04-12T09:56:27.752Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 09:59:14,473 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 09:59:14,823 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 09:59:14,823 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 09:59:14,930 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.4 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2024-04-12 09:59:14,932 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 09:59:28,464 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
WARN  2024-04-12 10:06:27,872 [shard 0:stre] repair - repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_members, role_attributes}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown
WARN  2024-04-12 10:06:27,872 [shard 0:stre] repair - repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: user-requested repair failed: std::runtime_error ({shard 0: std::runtime_error (repair[0af3616d-a2fd-4d74-b388-1c34d3f531d4]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_members, role_attributes}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown)})
WARN  2024-04-12 10:46:39,416 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
WARN  2024-04-12 10:46:39,416 [shard 0:stre] gossip - failure_detector_loop: Got error in the loop, live_nodes={172.17.0.3}: seastar::sleep_aborted (Sleep is aborted)
INFO  2024-04-12 10:46:39,416 [shard 0:stre] gossip - failure_detector_loop: Finished main loop
WARN  2024-04-12 10:46:39,426 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
INFO  2024-04-12 10:46:39,429 [shard 0:comp] compaction - [Compact keyspace1.standard1 ef184f10-f8b9-11ee-a7de-7ac0e72dd805] Compacting of 2 sstables interrupted due to: sstables::compaction_stopped_exception (Compaction for keyspace1/standard1 was stopped due to: shutdown)
INFO  2024-04-12 10:46:39,535 [shard 0:goss] rpc - client 172.17.0.3:58811 msg_id 2:  exception "gate closed" in no_wait handler ignored
WARN  2024-04-12 10:46:40,327 [shard 0:goss] gossip - === Gossip round FAIL: seastar::gate_closed_exception (gate closed)
INFO  2024-04-12 10:46:40,535 [shard 0:goss] rpc - client 172.17.0.3:58811 msg_id 4:  exception "gate closed" in no_wait handler ignored
WARN  2024-04-12 10:46:41,321 [shard 0:main] gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested)
INFO  2024-04-12 10:46:41,866 [shard 0:main] init - Shutting down direct_failure_detector
INFO  2024-04-12 10:46:41,866 [shard 0:main] init - Shutting down direct_failure_detector was successful
❯ docker logs 490b929ca57f 2>&1 | grep -iE 'error|fail|timed|except'
ts=2024-04-12T09:56:23.634Z caller=diskstats_linux.go:265 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data
WARN  2024-04-12 09:58:02,187 [shard 0:n/a ] seastar - Creation of perf_event based stall detector failed: falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO  2024-04-12 09:58:02,680 [shard 0:main] init - starting direct failure detector pinger service
INFO  2024-04-12 09:58:02,680 [shard 0:main] init - starting direct failure detector service
INFO  2024-04-12 09:58:02,749 [shard 0:stre] gossip - Feature check passed. Local node 172.17.0.3 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_COUNTER_ORDER, CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, CORRECT_STATIC_COMPACT_IN_MC, COUNTERS, DIGEST_FOR_NULL_VALUES, DIGEST_INSENSITIVE_TO_EXPIRY, DIGEST_MULTIPARTITION_READ, EMPTY_REPLICA_MUTATION_PAGES, EMPTY_REPLICA_PAGES, HINTED_HANDOFF_SEPARATE_CONNECTION, INDEXES, LARGE_COLLECTION_DETECTION, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, LWT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, MD_SSTABLE_FORMAT, ME_SSTABLE_FORMAT, NONFROZEN_UDTS, PARALLELIZED_AGGREGATION, PER_TABLE_CACHING, PER_TABLE_PARTITIONERS, RANGE_SCAN_DATA_VARIANT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_COMMITLOG, SCHEMA_TABLES_V3, SECONDARY_INDEXES_ON_STATIC_COLUMNS, SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT, STREAM_WITH_RPC_STREAM, SUPPORTS_RAFT_CLUSTER_MANAGEMENT, TABLE_DIGEST_INSENSITIVE_TO_EXPIRY, TOMBSTONE_GC_OPTIONS, TRUNCATION_TABLE, TYPED_ERRORS_IN_READ_RPC, UDA, UDA_NATIVE_PARALLELIZED_AGGREGATION, UNBOUNDED_RANGE_TOMBSTONES, UUID_SSTABLE_IDENTIFIERS, VIEW_VIRTUAL_COLUMNS, WRITE_FAILURE_REPLY, XXHASH}
INFO  2024-04-12 09:58:02,751 [shard 0:stre] gossip - failure_detector_loop: Started main loop
INFO  2024-04-12 09:58:16,434 [shard 0:stre] features - Feature TYPED_ERRORS_IN_READ_RPC is enabled
WARN  2024-04-12 10:06:16,457 [shard 0:stre] repair - repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_attributes, role_members}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown
WARN  2024-04-12 10:06:16,458 [shard 0:stre] repair - repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: user-requested repair failed: std::runtime_error ({shard 0: std::runtime_error (repair[480ff0fc-f8a4-4229-ac6e-dacbbf2f743b]: 2307 out of 2307 ranges failed, keyspace=system_auth, tables={roles, role_attributes, role_members}, repair_reason=repair, nodes_down_during_repair={172.17.0.2}, aborted_by_user=false, failed_because=unknown)})