scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 62 forks source link

scylla_cluster: load_from_repository: set default timeouts #516

Closed bhalevy closed 8 months ago

bhalevy commented 8 months ago

Commit 4f619b75ab048366f3a0f7971450d00485e910bc (scylladb/scylla-ccm#492) Improved the regular expression used in scylla_extract_mode to better detect the scylla_mode given the package url.

However, when we install from a local path, like /jenkins/workspace/scylla-master/dtest-debug/scylla/build/debug/dist/tar/scylla-debug-unified-5.4.0~dev-0.20231013.055f0617064d.x86_64.tar.gz, The regular expression fails to detect the mode since it falsely start the match at /scylla-master.

This change restricts the match to the last path component.

As seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/258/artifact/logs-full.debug.041/dtest-gw0.log

07:20:43,371 790     errors                         ERROR    conftest.py         :225  | test_cluster_expansion_with_cdc[Single_cluster]: test failed:
...
>               self.wait_for_binary_interface(from_mark=from_mark, process=self._process_scylla, timeout=t)
...
self = <ccmlib.scylla_node.ScyllaNode object at 0x7fbd89790f10>
exprs = 'Starting listening for CQL clients', from_mark = 0, timeout = 420

When cassandra_version is passed to ScyllaCluster.__init__ we set self.scylla_mode = None and the default timeouts are not adjusted.

scylla_mode is set correctly later on in load_from_repository. Call __set_default_timeouts again to adjust the timeout based on the valid scylla_mode.

bhalevy commented 8 months ago

return "self.started=True" back

done

bhalevy commented 8 months ago

BTW setting self.started=True seems to be happening too early. I'd consider doing that only before the function returns successfully

bhalevy commented 8 months ago

Besides fixing this regex, I think that the issue with scylla_mode in ScyllaCluster.__init__ could be this: https://github.com/scylladb/scylla-ccm/blob/78b5d266dca3daa209406ab14265fe61479c51e5/ccmlib/scylla_cluster.py#L33-L35

While we pass a valid cassandra_version in this case: https://github.com/scylladb/scylla-dtest/blob/a9b0676fc11eeab144d6d8d5fd402b4db6505dee/dtest_setup.py#L805-L809

        elif scylla_version:
            cluster = ScyllaCluster(dtest_setup.test_path, dtest_setup.cluster_name,
                                    cassandra_version=scylla_version, force_wait_for_cluster_start=True,
                                    manager=manager_install_dir,
                                    skip_manager_server=skip_manager_server)

But I'm not sure what path we take in the dtest-debug and dtest-release jenkins jobs

bhalevy commented 8 months ago

@fruch please re-review

bhalevy commented 8 months ago

https://github.com/scylladb/scylla-ccm/commit/168057e1e344f7648b34930e87393e834d191112 also fixes https://github.com/scylladb/scylla-ccm/pull/516#issuecomment-1766222735 by setting the default timeouts again after ScyllaCluster.scylla_mode is correctly extracted in load_from_repository

bhalevy commented 8 months ago

@scylladb/ccm-maint please merge