Open Lakshmipathi opened 1 week ago
Starting with 4node cluster.
reached 67% disk usage and started waiting for 30mins, no write or read.
[2024-11-09T08:32:41.257Z] < t:2024-11-09 08:32:41,038 f:full_storage_utilization_test.py l:157 c:FullStorageUtilizationTest p:INFO > Current max disk usage after writing to keyspace12: 67% (292 GB / 292.12 GB)
[2024-11-09T08:32:43.574Z] < t:2024-11-09 08:32:43,091 f:full_storage_utilization_test.py l:125 c:FullStorageUtilizationTest p:INFO > Wait for 1800 seconds
After 30min idle time, started throttled write:
[2024-11-09T09:04:17.601Z] < t:2024-11-09 09:04:16,771 f:stress_thread.py l:325 c:sdcm.stress_thread p:INFO > cassandra-stress write no-warmup duration=30m -rate threads=10 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema keyspace=keyspace1 "replication(strategy=NetworkTopologyStrategy,replication_factor=3)" -node 10.4.0.147,10.4.1.54,10.4.1.222,10.4.3.28 -errors skip-unsupported-columns
few minutes later, removing a node from 4-node cluster.
[2024-11-09T09:08:27.538Z] < t:2024-11-09 09:08:12,621 f:full_storage_utilization_test.py l:169 c:FullStorageUtilizationTest p:INFO > Removing a second node from the cluster
[2024-11-09T09:25:51.985Z] < t:2024-11-09 09:25:51,026 f:common.py l:43 c:sdcm.utils.tablets.common p:INFO > Waiting for tablets to be balanced
[2024-11-09T09:26:55.605Z] < t:2024-11-09 09:26:53,243 f:common.py l:48 c:sdcm.utils.tablets.common p:INFO > Tablets are balanced
[2024-11-09T09:26:55.605Z] < t:2024-11-09 09:26:53,243 f:full_storage_utilization_test.py l:64 c:FullStorageUtilizationTest p:INFO > Removing a node finished with time: 1120.621938943863
Final 3node cluster has disk usage at 88%,88% and 89%
https://argus.scylladb.com/tests/scylla-cluster-tests/b028c83a-47cf-404c-a649-2b714acba766 https://jenkins.scylladb.com/job/scylla-staging/job/LakshmipathiGanapathi/job/byo-longevity-test/209/consoleText
@paszkow @pehala @swasik , in the description we mentioned it as 30mins of read-only time before topology change. But the script current performs no read or no write for this duration. Is it fine or we need to have read op during this 30min before scale-in?
I think that in this case it is fine. The objective of this one is to be simple correctness test so no read is not a problem.