scylladb / scylla-bench

43 stars 36 forks source link

Benchmark doesn't react to topology changes #129

Open tgrabiec opened 1 year ago

tgrabiec commented 1 year ago

When a new node is added to the cluster, most (all?) requests still go to the old nodes. Only restarting the benchmark will cause new coordinators to receive CQL load. The benchmark should react to topology changes dynamically so that we can use it in tests which alter topology.

mmatczuk commented 1 year ago

I suggest you embrace the new driver in the bechmark, and start maintaining it.

fruch commented 1 year ago

When a new node is added to the cluster, most (all?) requests still go to the old nodes. Only restarting the benchmark will cause new coordinators to receive CQL load. The benchmark should react to topology changes dynamically so that we can use it in tests which alter topology.

@tgrabiec

Can you share a bit more information, like what is the command used, and logs of the run

We are using scylla-bench for quite some time on SCT, and we never noticed such an issue

fruch commented 1 year ago

I suggest you embrace the new driver in the bechmark, and start maintaining it.

I'm not sure that on @avelanarius team roadmap. I think they have other plans for how the drivers should be built.

tgrabiec commented 1 year ago

@fruch Scenario:

  1. Start 1 node cluster
  2. Start scylla-bench: ./scylla-bench -mode write -workload uniform -partition-count 1000000 -max-rate 3 -duration 1h -nodes 127.0.0.1 -concurrency 1
  3. Start 2nd node
  4. Observe in the dashboard that "Requests Served per Instance - Coordinator" is active only on the first node

Alternatively to step 4, enable storage_proxy logger level trace and observe that first node has in logs:

TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - mutate cl=QUORUM
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - creating write handler for token: 457961819418090099 natural: {127.0.0.1} pending: {}
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - Operation is not rate limited
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - creating write handler with live: {127.0.0.1} dead: {}

and 2nd node does not.

fruch commented 1 year ago

@pioter

I would assume the gocql driver should do that automatically, you think some can take a look why it doesn't ?

tgrabiec commented 10 months ago

This could actually be a scylla bug: https://github.com/scylladb/scylladb/issues/15841

kbr-scylla commented 10 months ago

https://github.com/scylladb/scylladb/issues/15841 only applies with --experimental-features=consistent-topology-changes

tgrabiec commented 10 months ago

scylladb/scylladb#15841 only applies with --experimental-features=consistent-topology-changes

I reported this bug when testing tablets (so with consistent-topology-changes).

fruch commented 10 months ago

scylladb/scylladb#15841 only applies with --experimental-features=consistent-topology-changes

I reported this bug when testing tablets (so with consistent-topology-changes).

So this issue can be closed ?

Anyhow now we have the driver support for tablets, and topology changes seem to cause null pointer exception, even without any experimental features, but that is tracked in different issue

tgrabiec commented 10 months ago

So this issue can be closed ?

Need to verify that it doesn't happen now.

roydahan commented 5 months ago

@ShlomiBalalis / @aleksbykov (tablets / raft-tpology) representatives, can we close this one?