Open pehala opened 3 weeks ago
@pehala This scenario seems to be incorrect. Without deletes you will hit out of space
error once you scale in. I think we should rather aim at having 5 nodes at ~ 72% of disk utilization and then scale-in. As a result you will end up with 4 node with ~90%
@pehala This scenario seems to be incorrect. Without deletes you will hit
out of space
error once you scale in. I think we should rather aim at having 5 nodes at ~ 72% of disk utilization and then scale-in. As a result you will end up with 4 node with ~90%
But in this scenario you perform scale out before scale in. So, if I understand correctly it is add node 4, then remove node 3 so in practice swap node 3 to 4.
I updated this description a bit. Based on the suggestion on testplan document, we have two variant for scale-in. a) 3node-cluster scale-in at 90% b) 4node-cluster scale-in at 67%.
For 3node-cluster scale-in at 90%, add a new node once tablet migration completed. Drop 20% of data from the cluster and then scale-in by removing a node. For 4node-cluster scale-in at 67%, we scale-in by removing a node, after tablet migration, cluster will be at around 90% storage utilization.
reached 92% disk usage and started waiting for 30mins, no write or read.
< t:2024-11-05 09:36:49,314 f:full_storage_utilization_test.py l:121 c:FullStorageUtilizationTest p:INFO > Current max disk usage after writing to keyspace10: 92% (398 GB / 392.40000000000003 GB)
< t:2024-11-05 09:36:50,342 f:full_storage_utilization_test.py l:87 c:FullStorageUtilizationTest p:INFO > Wait for 1800 seconds
After 30min idle time, started throttled write:
< t:2024-11-05 10:08:01,521 f:stress_thread.py l:325 c:sdcm.stress_thread p:INFO > cassandra-stress write no-warmup duration=30m -rate threads=10 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema keyspace=keyspace1 "replication(strategy=NetworkTopologyStrategy,replication_factor=3)" -node 10.4.1.62,10.4.3.97,10.4.1.100 -errors skip-unsupported-columns
Scaleout by adding a new node at 90%
< t:2024-11-05 10:09:57,086 f:full_storage_utilization_test.py l:35 c:FullStorageUtilizationTest p:INFO > Adding a new node
< t:2024-11-05 10:12:55,534 f:common.py l:43 c:sdcm.utils.tablets.common p:INFO > Waiting for tablets to be balanced
< t:2024-11-05 10:40:55,031 f:common.py l:48 c:sdcm.utils.tablets.common p:INFO > Tablets are balanced
Later, dropping some data before scale-in
< t:2024-11-05 10:40:55,031 f:full_storage_utilization_test.py l:48 c:FullStorageUtilizationTest p:INFO > Dropping some data
few minutes later, removing a node from 3-node cluster.
< t:2024-11-05 10:41:00,079 f:full_storage_utilization_test.py l:40 c:FullStorageUtilizationTest p:INFO > Removing a node
< t:2024-11-05 10:41:00,080 f:full_storage_utilization_test.py l:133 c:FullStorageUtilizationTest p:INFO > Removing a second node from the cluster
< t:2024-11-05 10:41:00,080 f:full_storage_utilization_test.py l:135 c:FullStorageUtilizationTest p:INFO > Node to be removed: df-test-master-db-node-1ffa6d64-2
Tablet migration over time
max/avg disk utilization
Latency 99th percentile write and read latency by Cluster (max at 90% disk utilization)
syscall | value |
---|---|
writes | 1.79ms |
read | 3.58ms |
Final 3node cluster has disk usage at 92%,91% and 87%
https://argus.scylladb.com/tests/scylla-cluster-tests/1ffa6d64-004a-4443-a3c9-d52a18ea08e1
Final 3node cluster has disk usage at 92%,91% and 87%
But if dropping 20% of data as suggested in the test plan, should not we get ca. 70% here? It was incorrectly stated in the doc - I fixed it. The idea behind it is to simulate the scenario where we loose plenty of data and because of it we can scale in to save resources.
But if dropping 20% of data as suggested in the test plan, should not we get ca. 70% here? It was incorrectly stated in the doc - I fixed it. The idea behind it is to simulate the scenario where we loose plenty of data and because of it we can scale in to save resources.
@Lakshmipathi ping
But if dropping 20% of data as suggested in the test plan, should not we get ca. 70% here? @swasik , Here is the flow for this case:
- In a 3-node cluster, we reached 92% disk usage.
- Wait for 30mins.
- Started throttled write.
- Now add a new node at 90%, total-nodes in the cluster=4
- From the graph, we can see avg disk usage drops after this operation.
- Wait for 30mins
- Drop 20% of data
- Start throttled write.
- Perform scale-in.
If I'm not wrong, the throttled write we do during scaling operation (3 and 8) - contributes to addition disk usage. Let me add more graphs to this issue.