scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
22 stars 66 forks source link

scylla_node: start_scylla: fix wait for binary interface timeout path #428

Closed bhalevy closed 1 year ago

bhalevy commented 1 year ago

Looking at https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-debug/135/testReport/, for example, we see that many tests in resharding_test and materialized_views_test.py::TestInterruptBuildProcess fail with

ccmlib.node.NodeError: 30 Jan 2023 05:31:16 [node1] Missing: ['Starting listening for CQL clients']:
Scylla version 5.3.0~dev-0.20230130.84a69b6adb3d w.....
See system.log for remainder

This is caused by a long standing bug (introduced in c1762841eec615ba315be1b4b841aaef5108281d):

            try:
                t = timeout * 4 if timeout is not None else 420 if self.cluster.scylla_mode != 'debug' else 900
                self.wait_for_binary_interface(from_mark=self.mark, process=self._process_scylla, timeout=60)
            except TimeoutError as e:
                if not self.wait_for_starting(from_mark=self.mark, timeout=t):
                    raise NodeError(f"{e}")
                pass

wait_for_starting return value indicates if bootstrap or resharding took place, not if 'Starting listening for CQL clients' was found. In addition there are secondary bugs related to the timeout passed to it which this PR fixes.

Tested with ./scripts/run_test.sh --cassandra-dir=$CASSANDRA_DIR -n 6 resharding_test.py::TestReshardingVariants materialized_views_test.py::TestInterruptBuildProcess

bhalevy commented 1 year ago

Please backport to all live branches.