Open nyh opened 4 years ago
When we have a test/alternator/run
option to run a cluster of multiple Scylla nodes, we can also have special tests which need such a cluster - and skipped when the cluster is smaller. For example, the test I wrote for issue #7236 (https://github.com/scylladb/scylla-dtest/pull/1698) requires a cluster of (at least) two nodes, and can reproduce the bug in a few seconds. Right now I'm planning to move this test to dtest, but theoretically it could also live in test/alternator
. dtest shines when the test needs to change the cluster on the fly - but this test does not, so it could have been a test/alternator
test.
We could put the multi-node tests in a separate source file, and it can have the fixtures and utilities like the following ideas:
/localnodes
request on the single known IP address to find all Alternator nodes, and skip the test if we don't have N of them.In #11645, we noticed that several tests of test/cql-pytest (we didn't check test/alternator) currently fail against a 3-node cluster and nobody noticed. Most of these are caused by problems in the test - not in Scylla. This demonstrates why having the ability to run 3-node tests once in a while would be useful.
Note that by now, we have much more elaborate support in test.py and test/pylib/ for tests written in Python that can use clusters of arbitrary and even changing sizes, similar to dtest's capabilities. So we don't want to replicate these features in test/alternator/run.
It may still be useful, however, to have the ability to start a test cluster with N nodes instead of 1, and run all the usual single-node tests on it. Just like a user can start a N-node cluster manually, and run "pytest" on it.
The Alternator test suite is usually run via the script,
test/alternator/run
. This script runs a single-node (with two shards) Scylla and runs the tests against it. However, we might have bugs which are specific to multi-node clusters, so once in a while we should run the same tests against a multi-node cluster. This issue requests adding an option to thetest/alternator/run
script to run multiple Scylla nodes.As in the existing script, we need to pay attention to making the tests reasonably fast. The Scylla boot speed is most important, because it is important when running a single test (something often done during development). For the
--skip-wait-for-gossip-to-settle
option thatrun
uses was very important to reduce boot delay, and it would be good to keep it - but this may perhaps require starting the Scylla processes in order instead of in parallel. Another example is--alternator-streams-time-window-s
which can slow down all the Streams test, and even if we can no longer use 0 (?), we shouldn't use 10 seconds either.