scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
50 stars 82 forks source link

Grow-shrink cluster perf test is not using multi-az #7384

Open soyacz opened 2 weeks ago

soyacz commented 2 weeks ago

while we configure multi az in grow-shrink cluster perf test, nodetool status show we use anyway only one rack.

Installation details

Cluster size: 3 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

OS / Image: ami-044f45ee3df20f616 (aws: undefined_region)

Test: scylla-master-perf-regression-latency-650gb-grow-shrink Test id: b2e47f7a-3ac6-4e8c-817f-dfb1f518850e Test name: scylla-master/scylla-master-perf-regression-latency-650gb-grow-shrink Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor b2e47f7a-3ac6-4e8c-817f-dfb1f518850e` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=b2e47f7a-3ac6-4e8c-817f-dfb1f518850e) - Show all stored logs command: `$ hydra investigate show-logs b2e47f7a-3ac6-4e8c-817f-dfb1f518850e` ## Logs: - **db-cluster-b2e47f7a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/db-cluster-b2e47f7a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/db-cluster-b2e47f7a.tar.gz) - **sct-runner-b2e47f7a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/sct-runner-b2e47f7a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/sct-runner-b2e47f7a.tar.gz) - **monitor-set-b2e47f7a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/monitor-set-b2e47f7a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/monitor-set-b2e47f7a.tar.gz) - **loader-set-b2e47f7a.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/loader-set-b2e47f7a.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/b2e47f7a-3ac6-4e8c-817f-dfb1f518850e/20240428_145423/loader-set-b2e47f7a.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-master/job/scylla-master-perf-regression-latency-650gb-grow-shrink/38/) [Argus](https://argus.scylladb.com/test/72a0f93b-6d0b-4f45-8462-e876a768cf54/runs?additionalRuns[]=b2e47f7a-3ac6-4e8c-817f-dfb1f518850e)
soyacz commented 1 week ago

root cause are missing backports to perf-v14:

Ideas:

  1. test backports on side branch, if ok backport those two to it. Find also bugfixes patches for those - might be a few.
  2. test grow-shrink perf test using master SCT branch
  3. create branch-perf-v15
fruch commented 1 week ago

root cause are missing backports to perf-v14:

Ideas:

  1. test backports on side branch, if ok backport those two to it. Find also bugfixes patches for those - might be a few.
  2. test grow-shrink perf test using master SCT branch
  3. create branch-perf-v15

I think #3 since we want to do it anyhow, maybe first get the parallel cluster setup in ? anything else you think worth waiting for ?

soyacz commented 1 week ago

I think #3 since we want to do it anyhow, maybe first get the parallel cluster setup in ? anything else you think worth waiting for ?

Just maybe the new throughput test I was working on.

fruch commented 1 week ago

I think #3 since we want to do it anyhow, maybe first get the parallel cluster setup in ? anything else you think worth waiting for ?

Just maybe the new throughput test I was working on.

and we can backport into it the new things (and fixes as needed)

fruch commented 6 days ago

let's raise it in the meeting, if it's important to fix and support multi-az

fruch commented 6 days ago

meanwhile let run on master, and see what's need adapting for example disable KMS

soyacz commented 5 days ago

master run (defaults, oss): https://jenkins.scylladb.com/job/scylla-staging/job/lukasz/job/scylla-master-perf-regression-latency-650gb-grow-shrink-tablets/2/

soyacz commented 5 days ago

failed due: https://github.com/scylladb/scylla-cluster-tests/issues/7432