scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.04k stars 1.24k forks source link

topology_experimental_raft/test_tablets is flaky #18896

Open raphaelsc opened 2 months ago

raphaelsc commented 2 months ago

caught in CI with repeat=100, not able to reproduce it locally so far.

this is https://github.com/scylladb/scylladb/issues/18904

https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/debug/topology_experimental_raft.test_tablets.82.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/debug/scylla-7860.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/debug/scylla-7868.log

https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/topology_experimental_raft.test_tablets.16.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/scylla-7874.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/scylla-7878.log

this seems an external issue when tearing down the test, causing reconnection storm (may be https://github.com/scylladb/scylladb/issues/15356)

15:07:29.261 DEBUG> Sending options message heartbeat on idle connection (140636003666256) 127.107.220.3:9042
15:07:29.262 DEBUG> Sending options message heartbeat on idle connection (140636022678288) 127.107.220.3:9042
15:07:29.262 DEBUG> Sending options message heartbeat on idle connection (140636022443856) 127.107.220.17:9042

https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/topology_experimental_raft.test_tablets.17.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/scylla-7875.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/scylla-7876.log https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/9204/artifact/testlog/x86_64/dev/scylla-7877.log

        manager.driver_close()
>       await manager.server_start(s0, wait_others=2)
raphaelsc commented 2 months ago

/cc @tgrabiec

tgrabiec commented 1 month ago

@raphaelsc These seem to be independent failures/issues, please split.