scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.51k stars 1.29k forks source link

Node stuck on api - decommission when it gossip partially blocked (dtest update_cluster_layout_tests.py::TestUpdateClusterLayout::test_decommission_node_while_gossip_partly_blocked ) #16573

Closed aleksbykov closed 7 months ago

aleksbykov commented 10 months ago

Installation details Scylla version (or git commit hash): 5.5.0~dev-0.20231224.da033343b793 with build-id 2917ab851e038771115f8fbee25d1a614e5ff8f7 Cluster size: 3

dtest: update_cluster_layout_tests.py::TestUpdateClusterLayout::test_decommission_node_while_gossip_partly_blocked failed by timeout: after iptables rules blocked port 7000 for node3, and node3 started decommission, the decommission process stucked on

INFO  2023-12-27 17:08:47,350 [shard 0:comp] compaction - [Compact system_schema.aggregates ecbb0960-a49f-11ee-bf51-dbf4accfec26] Compacting [/home/abykov/.dtest/dtest-76jx62xk/test/node3/data/system_schema/aggr
egates-924c55872e3a345bb10c12f37c1ba895/me-3gc9_0s6n_1jxw02wqmvy9tynqjq-big-Data.db:level=0:origin=memtable,/home/abykov/.dtest/dtest-76jx62xk/test/node3/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1
ba895/me-3gc9_0s6n_1aao12wqmvy9tynqjq-big-Data.db:level=0:origin=memtable]
INFO  2023-12-27 17:08:47,352 [shard 0:comp] compaction - [Compact system_schema.aggregates ecbb0960-a49f-11ee-bf51-dbf4accfec26] Compacted 2 sstables to [/home/abykov/.dtest/dtest-76jx62xk/test/node3/data/syste
m_schema/aggregates-924c55872e3a345bb10c12f37c1ba895/me-3gc9_0s6n_230m92wqmvy9tynqjq-big-Data.db:level=0]. 11kB to 6kB (~52% of original) in 2ms = 5MB/s. ~256 total partitions merged to 2.
INFO  2023-12-27 17:08:51,778 [shard 0:strm] api - decommission
INFO  2023-12-27 18:09:55,725 [shard 0:main] compaction_manager - Asked to stop
INFO  2023-12-27 18:09:55,726 [shard 0:main] view_update_generator - Terminating background fiber

the nodes status:

17:08:46,950 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2499 | start node3
17:08:46,956 ccm             DEBUG cluster.py     :754  | node3: Starting scylla: args=['/home/abykov/.dtest/dtest-76jx62xk/test/node3/bin/scylla', '--options-file', '/home/abykov/.dtest/dtest-76jx62xk/test/node3/conf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.25.3', '--smp', '2', '--memory', '1024M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.25.3', '--unsafe-bypass-fsync', '1', '--kernel-page-cache', '1', '--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=False wait_for_binary_proto=True
17:08:47,315 ccm             DEBUG cluster.py     :754  | node3: Starting scylla-jmx: args=['/home/abykov/.dtest/dtest-76jx62xk/test/node3/bin/symlinks/scylla-jmx', '-Dapiaddress=127.0.25.3', '-Djavax.management.builder.initial=com.scylladb.jmx.utils.APIBuilder', '-Djava.rmi.server.hostname=127.0.25.3', '-Dcom.sun.management.jmxremote', '-Dcom.sun.management.jmxremote.host=127.0.25.3', '-Dcom.sun.management.jmxremote.port=7199', '-Dcom.sun.management.jmxremote.rmi.port=7199', '-Dcom.sun.management.jmxremote.local.only=false', '-Xmx256m', '-XX:+UseSerialGC', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', '-jar', '/home/abykov/.dtest/dtest-76jx62xk/test/node3/bin/scylla-jmx-1.0.jar']
17:08:47,669 ccm             DEBUG cluster.py     :754  | node1: nodetool cmd=status wait=True timeout=None
17:08:48,848 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2503 | Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  127.0.25.1  285.01 KB  256          ?       1597c739-a6e7-4db6-ab72-03765b1c84b8  rack1
UN  127.0.25.2  469.38 KB  256          ?       675083f3-cf69-4833-aae4-16a808563ac3  rack1
UN  127.0.25.3  ?          256          ?       c14165cc-9bbb-4691-9b2b-094066ddf807  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

17:08:48,848 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2504 | 
17:08:48,849 ccm             DEBUG cluster.py     :754  | node2: nodetool cmd=status wait=True timeout=None
17:08:49,973 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2503 | Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  127.0.25.1  ?          256          ?       1597c739-a6e7-4db6-ab72-03765b1c84b8  rack1
UN  127.0.25.2  469.38 KB  256          ?       675083f3-cf69-4833-aae4-16a808563ac3  rack1
UN  127.0.25.3  ?          256          ?       c14165cc-9bbb-4691-9b2b-094066ddf807  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

17:08:49,973 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2504 | 
17:08:49,974 ccm             DEBUG cluster.py     :754  | node3: nodetool cmd=status wait=True timeout=None
17:08:51,064 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2503 | Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
?N  127.0.25.1  ?          256          ?       1597c739-a6e7-4db6-ab72-03765b1c84b8  rack1
?N  127.0.25.2  ?          256          ?       675083f3-cf69-4833-aae4-16a808563ac3  rack1
UN  127.0.25.3  735.37 KB  256          ?       c14165cc-9bbb-4691-9b2b-094066ddf807  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

17:08:51,064 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2504 | 
17:08:51,064 update_cluster_layout_tests INFO  update_cluster_layout_tests.py:2510 | try to decommission node3
17:08:51,065 ccm             DEBUG cluster.py     :754  | node3: nodetool cmd=decommission wait=True timeout=None

Without raft topology enabled, test passed Nodes logs: logs.tar.gz

aleksbykov commented 10 months ago

@gleb-cloudius , can you take a look?

gleb-cloudius commented 10 months ago

Commands that are issued through raft need quorum of nodes to be accessible. This test here isolates node3 from the cluster and issue decommission through raft, so it hangs waiting for the quorum. It needs to wait with timeout, but this is a knows problem. Not sure if we have an issue though @kbr-scylla? As it is the behaviour is expected.

kbr-scylla commented 10 months ago

I created https://github.com/scylladb/scylladb/issues/16604.

temichus commented 9 months ago

REMINDER: remove issues related test disabling from https://github.com/scylladb/scylla-dtest/pull/3836 when issue closed

kbr-scylla commented 7 months ago

Should be solved by @gusev-p 's quorum loss timeout patches

kbr-scylla commented 7 months ago

@temichus @aleksbykov now in raft-topology mode, the decommission attempt times out due to quorum loss. But the test fails because it expects a different error.

assert ('Rejected decommission operation' in "ToolError('Subprocess /jenkins/workspace/scylla-staging/artsiom_mishuta/dtest_raft_topolgy/dtest-full-with-consistent-topology-changes/scylla/.ccm/scylla-repository/6bd0be71ab32ea535d332b0324d81892973611c1/share/cassandra/bin/nodetool -h 127.0.36.3 -p 7199 -Dcom.sun.jndi.rmiURLParsing=legacy -Dcom.scylladb.apiPort=10000 decommission exited with non-zero status; exit status: 4; \\nstderr: error executing POST request to http://127.0.36.3:10000/storage_service/decommission with parameters {}: remote replied with status code 500 Internal Server Error:\\nservice::raft_operation_timeout_error (group [b34c4e91-ed16-11ee-ade9-f7052ddb7928] raft operation [read_barrier] timed out)\\n')" or 'Cannot start' in "ToolError('Subprocess /jenkins/workspace/scylla-staging/artsiom_mishuta/dtest_raft_topolgy/dtest-full-with-consistent-topology-changes/scylla/.ccm/scylla-repository/6bd0be71ab32ea535d332b0324d81892973611c1/share/cassandra/bin/nodetool -h 127.0.36.3 -p 7199 -Dcom.sun.jndi.rmiURLParsing=legacy -Dcom.scylladb.apiPort=10000 decommission exited with non-zero status; exit status: 4; \\nstderr: error executing POST request to http://127.0.36.3:10000/storage_service/decommission with parameters {}: remote replied with status code 500 Internal Server Error:\\nservice::raft_operation_timeout_error (group [b34c4e91-ed16-11ee-ade9-f7052ddb7928] raft operation [read_barrier] timed out)\\n')")

We can adjust the test now so it also passes when it sees raft_operation_timeout_error in the error message.

Reassigning to you.