scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
70 stars 42 forks source link

tests/integration: set `skip_wait_for_gossip_to_settle=0` #301

Closed fruch closed 2 months ago

fruch commented 6 months ago

to speed up the boot sequence of scylla nodes we are using skip_wait_for_gossip_to_settle=0 same as we are using for quite a while in dtest on almost all tests

also introduced wait_other_notice=True for places where starting the cluster, because without it we can get into situation we start a test, and cluster isn't fully ready and up.

this change shaves 1h of integration tests run, and it's now finishes in 28min.

Lorak-mmk commented 6 months ago

Interesting, I remember that I did try to do this at one point, but got a lot of failures. Maybe I just made some mistake when running the tests.

fruch commented 6 months ago

Interesting, I remember that I did try to do this at one point, but got a lot of failures. Maybe I just made some mistake when running the tests.

it depends when you tried it, we (mostly @nyh) did a lot of fine tuning to ccm, to support this case correctly. while trying to figure out why that UDT test is failing, it was annoying to wait that much time for cluster creation.

Lorak-mmk commented 5 months ago

I think we can merge it after CI passes

fruch commented 5 months ago

I think we can merge it after CI passes

one of the integration suite was stuck for 5h, I'm running it all again:

tests/integration/standard/test_metadata.py ss...s.............x...s.s.. [ 15%]
s...s.ss.s...x.s.x.....sssssssssss...ss.s....s.s...ss                    [ 20%]
Error: The operation was canceled.

I'm not sure if it's connected to this change or not, we'll need more reruns, and maybe enabling of more debug in CI to figure this one out

fruch commented 5 months ago

I think we can merge it after CI passes

one of the integration suite was stuck for 5h, I'm running it all again:

tests/integration/standard/test_metadata.py ss...s.............x...s.s.. [ 15%]
s...s.ss.s...x.s.x.....sssssssssss...ss.s....s.s...ss                    [ 20%]
Error: The operation was canceled.

I'm not sure if it's connected to this change or not, we'll need more reruns, and maybe enabling of more debug in CI to figure this one out

it getting stuck also in other places, which are not this PR: https://github.com/scylladb/python-driver/actions/runs/8076169015/job/22064206623

tests/integration/standard/test_metadata.py ss...s.............x...s.s.. [ 15%]
s...s.ss.s...x.s.x.....sssssssssss...ss.s....s.s...ss                    [ 20%]
Error: The operation was canceled.
fruch commented 5 months ago

clearly from logs, test_connection_error is the one getting stuck, still not clear why

also seen that test_connection_honor_cluster_port leave a trail of session behind, which keep trying to reconnect to cluster that isn't' there anymore

Lorak-mmk commented 3 months ago

clearly from logs, test_connection_error is the one getting stuck, still not clear why

also seen that test_connection_honor_cluster_port leave a trail of session behind, which keep trying to reconnect to cluster that isn't' there anymore

Are the problems in those tests caused by this PR? If not then I think we can merge this

fruch commented 3 months ago

clearly from logs, test_connection_error is the one getting stuck, still not clear why

also seen that test_connection_honor_cluster_port leave a trail of session behind, which keep trying to reconnect to cluster that isn't' there anymore

Are the problems in those tests caused by this PR? If not then I think we can merge this

I didn't find any connection to this change

roydahan commented 3 months ago

Looks like all tests are passing now, aren't they?