scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 62 forks source link

scylla_node: no need to _wait_no_pending_flushes #500

Closed bhalevy closed 10 months ago

bhalevy commented 10 months ago

The flush api is synchronous and so it waits for the respective flush to get done and will return and error if flush failed for any reason and this will propagate through nodetool flush.

_wait_no_pending_flushes may fail if there are spontanuous flushes, and it also tends to timeout in nodetool cfstats in debug mode, see for example: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/250/testReport/alternator_ttl_tests/TestAlternatorTTL/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split029___test_ttl_with_load_and_decommission/

../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1320: in flush
    self._wait_no_pending_flushes(verbose=kwargs.get('verbose', True))
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1314: in _wait_no_pending_flushes
    if not wait_for(no_pending_flushes, timeout=wait_timeout, step=1.0):
../scylla/.local/lib/python3.11/site-packages/ccmlib/common.py:675: in wait_for
    if func():
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:1306: in no_pending_flushes
    stdout = self.nodetool('cfstats', timeout=wait_timeout, verbose=verbose)[0]
../scylla/.local/lib/python3.11/site-packages/ccmlib/scylla_node.py:754: in nodetool
    return super().nodetool(*args, **kwargs)
../scylla/.local/lib/python3.11/site-packages/ccmlib/node.py:794: in nodetool
    stdout, stderr = p.communicate(timeout=timeout)
/usr/lib64/python3.11/subprocess.py:1209: in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
/usr/lib64/python3.11/subprocess.py:2109: in _communicate
    self._check_timeout(endtime, orig_timeout, stdout, stderr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Popen: returncode: None args: ['/jenkins/workspace/scylla-master/dtest-debu...>
endtime = 2588.165606614, orig_timeout = 60, stdout_seq = [], stderr_seq = []
skip_check_and_raise = False

    def _check_timeout(self, endtime, orig_timeout, stdout_seq, stderr_seq,
                       skip_check_and_raise=False):
        """Convenience for checking if a timeout has expired."""
        if endtime is None:
            return
        if skip_check_and_raise or _time() > endtime:
>           raise TimeoutExpired(
                    self.args, orig_timeout,
                    output=b''.join(stdout_seq) if stdout_seq else None,
                    stderr=b''.join(stderr_seq) if stderr_seq else None)
E           subprocess.TimeoutExpired: Command '['/jenkins/workspace/scylla-master/dtest-debug/scylla/.ccm/scylla-repository/93be4c0cb0f0c53fe0eb7cd4d06ba15a7fc01d29/share/cassandra/bin/nodetool', '-h', '127.0.12.2', '-p', '7199', '-Dcom.sun.jndi.rmiURLParsing=legacy', 'cfstats']' timed out after 60 seconds