scylladb / scylla-ccm

Cassandra Cluster Manager, modified for Scylla
Apache License 2.0
20 stars 62 forks source link

scylla_node: move `wait_other_notice` to the end of `_start_scylla` #522

Closed fruch closed 8 months ago

fruch commented 8 months ago

as in #483, but just changing the order of wait_other_notice cause it's causing some cases a node might finish trying booting during the wait_other_notice check. and that's breaks some test assumptions

Fixes: https://github.com/scylladb/scylla-enterprise/issues/3443 Ref: https://github.com/scylladb/scylla-ccm/pull/483

bhalevy commented 8 months ago

Why not just backport #483? (and https://github.com/scylladb/scylla-ccm/pull/516 on top of it)

fruch commented 8 months ago

Why not just backport #483? (and #516 on top of it)

cause that gonna change the logic of multiple tests in dtest that was affected by it

bhalevy commented 8 months ago

Why not just backport #483? (and #516 on top of it)

cause that gonna change the logic of multiple tests in dtest that was affected by it

What do you mean by "change the logic"? It's an infrastructure change. We should backport infrastructure fixes to live branches that are exposed to the issues we've fixed in order to have a more stable testing environment in these branches, especially while they are not yet released (like branch-5.4). I'm the one goes over 10s of failures in dtest jobs on release branches so I guess it bothers me more than others, but I really think we should prefer the stability of the tests over stability of the dtest branch. Of course we don't need to backport new tests or code cleanups. But, for example, a patch like e052264f43f224a3bba6d2628890df095913762d that fixes an obvious issue affecting timeouts in debug mode can be safely backported to all branches that require it.

fruch commented 8 months ago

Why not just backport #483? (and #516 on top of it)

cause that gonna change the logic of multiple tests in dtest that was affected by it

What do you mean by "change the logic"? It's an infrastructure change. We should backport infrastructure fixes to live branches that are exposed to the issues we've fixed in order to have a more stable testing environment in these branches, especially while they are not yet released (like branch-5.4). I'm the one goes over 10s of failures in dtest jobs on release branches so I guess it bothers me more than others, but I really think we should prefer the stability of the tests over stability of the dtest branch. Of course we don't need to backport new tests or code cleanups. But, for example, a patch like e052264 that fixes an obvious issue affecting timeouts in debug mode can be safely backported to all branches that require it.

again those changes are built ontop of other changes, especially the introduction of watch_rest_for_alive also I don't understand 5.4 is related to a PR targeted for 2023.1

anyhow I'm still testing, and looks like it's not helping anyhow, and we'll need to change the test a bit for 2023.1

fruch commented 8 months ago

This doesn't fixes the issue intended to address, closing it

roydahan commented 8 months ago

@bhalevy we prefer not to backport the huge change of ccm and all relevant dtest to 2023.1 as it will shake things up really bad for 2023.1 which now has only one test failing (this one that Israel tries to fix).