scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
55 stars 93 forks source link

DecommissionSeedNode nemesis is failing on Docker backend #7328

Open dimakr opened 5 months ago

dimakr commented 5 months ago

DecommissionSeedNode test case is failing on Docker backend with the error:

Last events by severity
CRITICAL - [60]
2024-04-08 12:52:28.274: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=d80834b5-1f7f-43e7-9ba7-c0371561612c during_nemesis=NodetoolSeedDecommission: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=559 node=Node longevity-1gb-1h-nemesis-longevit-loader-node-c3eea640-0 [172.17.0.5 | 172.17.0.5] (seed: False)
java.io.IOException: Operation x10 on key(s) [3537324c3436314d3330]: Error executing: (UnavailableException): Not enough replicas available for query at consistency QUORUM (2 required but only 1 alive)
...

ERROR - [4]
2024-04-08 12:29:16.518: (NodetoolEvent Severity.ERROR) period_type=end event_id=4d315db6-4735-4c54-8af4-6e00faf5209e during_nemesis=NodetoolSeedDecommission duration=2s: nodetool_command=cleanup node=longevity-1gb-1h-nemesis-longevit-db-node-c3eea640-0 errors=["Encountered a bad command exit code!\n\nCommand: '/usr/bin/nodetool  cleanup scylla_bench'\n\nExit code: 1\n\nStdout:\n\nnodetool: Keyspace [scylla_bench] does not exist.\nSee 'nodetool help' or 'nodetool help <command>'.\n\nStderr:\n\n\n\n", 'Traceback (most recent call last):\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2529, in run_nodetool\n    self.remoter.run(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose, retry=retry)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 614, in run\n    result = _run()\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 65, in inner\n    return func(*args, **kwargs)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 605, in _run\n    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 538, in _run_execute\n    result = connection.run(**command_kwargs)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run\n    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)\n  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run\n    raise UnexpectedExit(result)\nsdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!\n\nCommand: \'/usr/bin/nodetool  cleanup scylla_bench\'\n\nExit code: 1\n\nStdout:\n\nnodetool: Keyspace [scylla_bench] does not exist.\nSee \'nodetool help\' or \'nodetool help <command>\'.\n\nStderr:\n\n\n\n\n']
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2529, in run_nodetool
self.remoter.run(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose, retry=retry)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 614, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 65, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 605, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 538, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: '/usr/bin/nodetool  cleanup scylla_bench'
Exit code: 1
Stdout:
nodetool: Keyspace [scylla_bench] does not exist.
See 'nodetool help' or 'nodetool help <command>'.
Stderr:
...

Installation details

SCT Version: master Scylla version (or git commit hash): 2024.1.2

Logs

soyacz commented 5 months ago

looks like 2 nodes were down (or overloaded cluster), decommission should work. Can you share argus link?

dimakr commented 5 months ago

@soyacz There was no already a build in Jenkins as it was rotated, and probably because of that no build in Argus. I re-executed the Nemesis - Jenkins build Still don't see the build in Argus.

soyacz commented 5 months ago

that's because configurations/nemesis/additional_configs/docker_backend_local.yaml was used which contains:

# TODO: remove this when we'll run this in jenkins
enable_argus: false

is it fixed for jobs in jenkins?

dimakr commented 5 months ago

is it fixed for jobs in jenkins? @soyacz Right, this is disabled for now in the master, until the change to enable executing docker backend in Jenkins is merged (and I was using my own branch to run the tests in Jenkins).