scylladb / cql-stress

10 stars 4 forks source link

Low ops due high latencies when CL=ALL #98

Open soyacz opened 3 months ago

soyacz commented 3 months ago

I tried cql-stress in performance test, during preload cluster phase we don't throttle anything to achieve max throughput. Example command executed: cql-stress-cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=400 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..162500000 this is repeated across 4 loaders on c6i.2xlarge machines. The outcome is very unsatisfying, as db load is not saturated and this preload stage takes much longer than using cassandra-stress (cluster reaches ~60kops on average, while capable more than double of that). From the other hand, when running with throttling (again across 4 loaders), we reach desired ops value: cassandra-stress write no-warmup cl=QUORUM duration=2850m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate 'threads=250 fixed=20332/s' -col 'size=FIXED(128) n=FIXED(8)' -pop 'dist=gauss(1..650000000,325000000,9750000)' See graphs showing first preload stage and then throttled one: image

There's also another possibility - seq dist used in preload vs gauss in throttled stage or CL=ALL affecting - adapt title if my guess was wrong.

More details about this run:

Packages

Scylla version: 6.1.0~dev-20240625.c80dc5715668 with build-id bf0032dbaafe5e4d3e01ece0dcb7785d2ec7a098

Kernel Version: 5.15.0-1063-aws

Installation details

Cluster size: 3 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

OS / Image: ami-09006ca344092e50b (aws: undefined_region)

Test: elasticity-test Test id: ab781f2c-b3fe-4294-b3bc-83fcfe105c2d Test name: scylla-staging/lukasz/elasticity-test Test config file(s):

Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor ab781f2c-b3fe-4294-b3bc-83fcfe105c2d` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=ab781f2c-b3fe-4294-b3bc-83fcfe105c2d) - Show all stored logs command: `$ hydra investigate show-logs ab781f2c-b3fe-4294-b3bc-83fcfe105c2d` ## Logs: - **db-cluster-ab781f2c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/db-cluster-ab781f2c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/db-cluster-ab781f2c.tar.gz) - **sct-runner-events-ab781f2c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-runner-events-ab781f2c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-runner-events-ab781f2c.tar.gz) - **sct-ab781f2c.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-ab781f2c.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/sct-ab781f2c.log.tar.gz) - **loader-set-ab781f2c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/loader-set-ab781f2c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/loader-set-ab781f2c.tar.gz) - **monitor-set-ab781f2c.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/monitor-set-ab781f2c.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/ab781f2c-b3fe-4294-b3bc-83fcfe105c2d/20240703_174702/monitor-set-ab781f2c.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/lukasz/job/elasticity-test/4/) [Argus](https://argus.scylladb.com/test/26ab8115-17c6-42f6-b5f3-4205afe17e5a/runs?additionalRuns[]=ab781f2c-b3fe-4294-b3bc-83fcfe105c2d)
soyacz commented 3 months ago

possbile cause is also high latencies for CL=ALL: image

soyacz commented 3 months ago

analogous results for c-s: image image So possibly issue is common for both c-s and cql-stress and high latencies are cause of that

But one more thing is that latencies using cql-stress are higher and wer'e a bit (couple minutes) slower than c-s.