scylladb / scylla-bench

43 stars 36 forks source link

Is there an option for shard aware? #70

Closed pveentjer closed 2 years ago

pveentjer commented 3 years ago

I was wondering why the throughput numbers were so low compared to cassandra-stress (from Scylla) and the number of connections is way lower. So by default scylla-bench isn't shard aware, it doesn't open a connection to every shard and route traffic to the right shard.

piodul commented 3 years ago

Which host selection policy are you using (the host-selection-policy option)? Shard-awareness should work only with the default option, which is token-aware. In that case, the gocql driver should open exactly one connection for each shard and will ignore the connection-count parameter. See here for more details.

pveentjer commented 3 years ago

I'm using default.

The number of connections is not correctly determined with default settings and with explicit

go/bin/scylla-bench -workload uniform -mode read -partition-count 10000000 -nodes 172.31.10.9 -concurrency 1600 -duration 10m -clustering-row-count 1 -host-selection-policy=token-aware 2>&1 | tee -a scylla-bench-16-08-2021_11-21-07.log

The number of shard is 92. And I only see connections for the first 23 shards. So in total there are 23 connections even though there should be 92.

I'm using the scylla-bench master branch.

piodul commented 3 years ago

How did you install scylla-bench? Like this?

go install github.com/scylladb/scylla-bench

I just checked and this version indeed is not shard-aware. However, if you build scylla-bench from source it should be shard-aware:

git clone https://github.com/scylladb/scylla-bench
cd scylla-bench/
go install .

Please check if the version built from source works better.

pveentjer commented 3 years ago

I'm using the first approach.

When using '-host-selection-policy=token-aware' the connection count setting is not ignored btw. I set it to 92 and every shard has multiple connections (it seems 3 or 4 per shard).

I'll try out the other approach now.

piodul commented 3 years ago

I'm using the first approach.

When using '-host-selection-policy=token-aware' the connection count setting is not ignored btw. I set it to 92 and every shard has multiple connections (it seems 3 or 4 per shard).

I'm not exactly sure what is happening here, but I suspect that go install builds scylla-bench using the original gocql/gocql driver, not the scylladb/gocql fork. Only the scylladb/gocql fork supports shard-awareness, gocql/gocql does not.

pveentjer commented 3 years ago

It seems like a bug :) Let me verify first with the custom build approach if shard aware works properly.

pveentjer commented 3 years ago

I see there are just a few cross shard operations. So it seems to be shard aware.

piodul commented 3 years ago

I did some digging and it seems that my theory about go install not using our fork is correct. We are using a replace directive to substitute our fork in place of the upstream (which is a recommended practice according to our fork's readme). However, commands like go get and go install do not honor the replace directive - it is expected behavior: https://github.com/golang/go/issues/30354 - which resulted in you accidentally using the non-shard-aware driver.

I don't think this issue should be classified as a bug in scylla-bench, because the problem was caused by the combination of go tool's idiosyncrasies and the way our fork is supposed to be substituted. However, the proper way to install scylla-bench should be documented in the readme - now, the recommended way is go get, which is not what we want! I'll send a PR with an improved instruction.

pveentjer commented 3 years ago

If you can update the documentation for installing, then I'm fine with closing this issue.