scylladb / scylla-bench

42 stars 34 forks source link

Benchmark doesn't react to topology changes #129

Open tgrabiec opened 1 year ago

tgrabiec commented 1 year ago

When a new node is added to the cluster, most (all?) requests still go to the old nodes. Only restarting the benchmark will cause new coordinators to receive CQL load. The benchmark should react to topology changes dynamically so that we can use it in tests which alter topology.

mmatczuk commented 1 year ago

I suggest you embrace the new driver in the bechmark, and start maintaining it.

fruch commented 1 year ago

When a new node is added to the cluster, most (all?) requests still go to the old nodes. Only restarting the benchmark will cause new coordinators to receive CQL load. The benchmark should react to topology changes dynamically so that we can use it in tests which alter topology.

@tgrabiec

Can you share a bit more information, like what is the command used, and logs of the run

We are using scylla-bench for quite some time on SCT, and we never noticed such an issue

fruch commented 1 year ago

I suggest you embrace the new driver in the bechmark, and start maintaining it.

I'm not sure that on @avelanarius team roadmap. I think they have other plans for how the drivers should be built.

tgrabiec commented 1 year ago

@fruch Scenario:

  1. Start 1 node cluster
  2. Start scylla-bench: ./scylla-bench -mode write -workload uniform -partition-count 1000000 -max-rate 3 -duration 1h -nodes 127.0.0.1 -concurrency 1
  3. Start 2nd node
  4. Observe in the dashboard that "Requests Served per Instance - Coordinator" is active only on the first node

Alternatively to step 4, enable storage_proxy logger level trace and observe that first node has in logs:

TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - mutate cl=QUORUM
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - creating write handler for token: 457961819418090099 natural: {127.0.0.1} pending: {}
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - Operation is not rate limited
TRACE 2023-08-09 23:11:02,393 [shard 1] storage_proxy - creating write handler with live: {127.0.0.1} dead: {}

and 2nd node does not.

fruch commented 12 months ago

@pioter

I would assume the gocql driver should do that automatically, you think some can take a look why it doesn't ?

tgrabiec commented 7 months ago

This could actually be a scylla bug: https://github.com/scylladb/scylladb/issues/15841

kbr-scylla commented 7 months ago

https://github.com/scylladb/scylladb/issues/15841 only applies with --experimental-features=consistent-topology-changes

tgrabiec commented 7 months ago

scylladb/scylladb#15841 only applies with --experimental-features=consistent-topology-changes

I reported this bug when testing tablets (so with consistent-topology-changes).

fruch commented 7 months ago

scylladb/scylladb#15841 only applies with --experimental-features=consistent-topology-changes

I reported this bug when testing tablets (so with consistent-topology-changes).

So this issue can be closed ?

Anyhow now we have the driver support for tablets, and topology changes seem to cause null pointer exception, even without any experimental features, but that is tracked in different issue

tgrabiec commented 7 months ago

So this issue can be closed ?

Need to verify that it doesn't happen now.

roydahan commented 3 months ago

@ShlomiBalalis / @aleksbykov (tablets / raft-tpology) representatives, can we close this one?