scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
58 stars 95 forks source link

Add testcase for adding additional DC while having 90% storage utilization #9157

Open pehala opened 2 weeks ago

pehala commented 2 weeks ago

At 90% storage usage, scaleout cluster by adding a new DC to existing cluster.

cezarmoise commented 4 days ago

Last test run: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/cezar/job/byo-longevity-test/69/consoleFull

04:34:09  < t:2024-11-21 02:34:06,572 f:full_storage_utilization_test_2.py l:131  c:FullStorageUtilizationTest2 p:INFO  > Node     Total GB     Used GB      Avail GB     Used %  
04:34:29  < t:2024-11-21 02:34:28,756 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 1        436          396          40           91.0%
04:34:51  < t:2024-11-21 02:34:50,943 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 2        436          393          44           90.0%
04:35:13  < t:2024-11-21 02:35:13,134 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 3        436          395          42           91.0%
04:35:36  < t:2024-11-21 02:35:35,329 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 4        436          403          34           93.0%
04:35:36  < t:2024-11-21 02:35:35,851 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 5        436          37           400          9.0%
04:35:58  < t:2024-11-21 02:35:58,183 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 6        436          37           400          9.0%
04:36:21  < t:2024-11-21 02:36:20,536 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 7        436          37           400          9.0%
04:36:21  < t:2024-11-21 02:36:20,536 f:full_storage_utilization_test_2.py l:153  c:FullStorageUtilizationTest2 p:INFO  > Cluster  3052         1698         1360         56.0%

Did not redistribuite data to new dc.

Trying fix https://github.com/scylladb/scylla-cluster-tests/pull/9305/commits/1b0e85b6817bd400a91761886c3cb1853a944ca3

cezarmoise commented 3 days ago

https://argus.scylladb.com/tests/scylla-cluster-tests/e70ab70a-063f-463e-a289-39a90805e597

cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Only one DC's RF can be changed at a time and not by more than 1"
cezarmoise commented 3 days ago

https://github.com/cezarmoise/scylla-cluster-tests/tree/new-dc

Trying to alter all keyspaces before adding the dc so they have per dc replication, and changing it after is only one change.

https://argus.scylladb.com/tests/scylla-cluster-tests/4cb74447-6750-4bba-83ef-4ccad8cf6a89

cezarmoise commented 3 days ago

failed due to timeout on a large keyspace, updating to only add replicate small keyspaces

2024-11-22 13:41:14.531: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=9263a51a-5198-4ce0-a22c-dfaf3d53fec5, source=FullStorageUtilizationTest2.test_scale_out (full_storage_utilization_test_2.FullStorageUtilizationTest2)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 245, in test_scale_out
self.scale_out()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 57, in scale_out
self.add_new_node()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 63, in add_new_node
self.reconfigure_keyspaces()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 97, in reconfigure_keyspaces
self.execute_cql(cql)
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 36, in execute_cql
results = session.execute(query)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1318, in execute_verbose
return execute_orig(*args, **kwargs)
File "cassandra/cluster.py", line 2729, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 5120, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'10.4.3.201:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.4.3.201:9042
cezarmoise commented 3 days ago

Still timeout issues, https://argus.scylladb.com/tests/scylla-cluster-tests/5117a642-3a7a-4c9a-ba43-d1898756f556

Set timeout on queries to 5min an try again

cezarmoise commented 9 hours ago

Image

https://argus.scylladb.com/tests/scylla-cluster-tests/6b7ea346-0ca8-42c8-a955-7f7f4f3d1922

Only added the small keyspaces to the new dc, as I got timeouts when trying to alter the large ones. The big sleeps are removed here to run the test faster

Will update with a new run.