scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 94 forks source link

Test reader_concurrency_semaphore in SCT #7996

Open mykaul opened 3 months ago

mykaul commented 3 months ago

We now have a new configurable CPU admission control - see https://github.com/scylladb/scylladb/issues/19017 for details. However, that configuration, which apparently is going to be applied at customers, need to be tested regularly.

denesb commented 3 months ago

What do you have in mind? Test that ScyllaDB is generally works with cpu concurrency > 1, or that latencies don't suffer? The latter is not guaranteed, some workloads can react adversely.

Note that originally this change was targeted at a single cluster of a single customer, and I only submitted it to master to avoid having to keep said customer on a private build. But now it seems that the spirit is out of the bottle and this will possibly see much more wide-spread use than anticipated. I've seen this new configurable used on the first sign of any semaphore reports, and I heard plans to apply this on clusters preemptively even. With that in mind, indeed we need testing on this, latency comparison tests with concurrency=1 v.s concurrency=100 would be especially useful, with different workloads and data shapes.

mykaul commented 3 months ago

What do you have in mind? Test that ScyllaDB is generally works with cpu concurrency > 1, or that latencies don't suffer? The latter is not guaranteed, some workloads can react adversely.

Note that originally this change was targeted at a single cluster of a single customer, and I only submitted it to master to avoid having to keep said customer on a private build. But now it seems that the spirit is out of the bottle and this will possibly see much more wide-spread use than anticipated. I've seen this new configurable used on the first sign of any semaphore reports, and I heard plans to apply this on clusters preemptively even. With that in mind, indeed we need testing on this, latency comparison tests with concurrency=1 v.s concurrency=100 would be especially useful, with different workloads and data shapes.

I share your fear that it'll spread. Just like static compaction shares = 100 ...

There are two possible approaches, and I dislike both:

  1. Randomly - set it to 1% or 10% of SCT runs to a value other than the default.
  2. Have specific jobs that run with it.

I slightly prefer the 1st one.