Closed amoskong closed 5 years ago
The seed nodes are drained for rollback, READ workload was started almost at same time.
Key timestamp:
2019-08-30 12:04:50,428 [scylla-test@10.142.0.13]: Running command "nodetool -u cassandra -pw cassandra drain "...
- drain the seed node (which will be rollbacked soon)
2019-08-30 12:04:51,608 f:sct_events.py l:452 c:sdcm.sct_events p:INFO > stress_cmd=cassandra-stress read no-warmup cl=QUORUM duration=20m -schema keyspace=keyspace1 'replication(factor=3) compressio
n=LZ4Compressor' -port jmx=6868 -mode cql3 native compression=lz4 user=cassandra password=cassandra -rate threads=1000 -pop seq=1..10000000 -log interval=5 -node 10.142.0.13
- starting read workload
2019-08-30 12:05:05,191 Running READ with 1000 threads 20 minutes
- workload started
2019-08-30 12:05:22,149 [scylla-test@10.142.0.13] "sudo systemctl start scylla-server.service"...
- start rollbacked node (node1, seed)
Database log of seed node: (I didn't find any special error/exception in seed node)
This issue occurred on Debian 9.
Scylla version: 3.1.0.rc5-0.20190902.623ea5e3d
@amoskong does this happen only when the seed node is the one that is the one that being rollbacked and upgraded?
Maybe it's related to the fact we run the c-s command with --node [IP_OF_SEED_NODE] and this node is the one that is being drained at the same time... The c-s needs this node up and running at least till it starts doing I/O.
On Tue, Sep 3, 2019 at 9:16 PM Roy Dahan notifications@github.com wrote:
@amoskong https://github.com/amoskong does this happen only when the seed node is the one that is the one that being rollbacked and upgraded?
Yes.
Maybe it's related to the fact we run the c-s command with --node [IP_OF_SEED_NODE] and this node is the one that is being drained at the same time...
Yes.
The c-s needs this node up and running at least till it starts doing I/O.
Is it a real issue? do we need to fix our test to avoid this situation?
If this is the case, it's not a real issue. You need to wait for the c-s to start and only then start the rollback process (including the drain). If this solves the issue, we can close this one.
@roydahan / @amoskong can we close this ?
I didn't reproduce it after wait a while before start rollback, so we can close this ticket. I can reopen it if I saw it in future.
Installation details Scylla version (or git commit hash): from
3.0.10-0.20190815.b3bfd8c08
to3.1.0.rc4-0.20190829.d70c2db09
Cluster size: 4 OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu 16.04Description: In upgrade test, we upgraded two db nodes first, then start a READ workload in background. Rollback the last upgrade node (rolling-upgrade-upgrade--ubuntu-xen-db-node-aa815381-0-1), and upgrade rest (3) nodes to latest. In this job, the second upgraded node is the seed node, so the rollbacked node is the seed node. The READ workload touched TransportException as expected when rollback is started, but it didn't recover when the rollback is completed, event all nodes are upgraded. I see many
Failed to create client too many times
error from c-s.Logs: