redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.51k stars 580 forks source link

Latency increases over time in OMB test #11189

Open travisdowns opened 1 year ago

travisdowns commented 1 year ago

Version & Environment

Redpanda version: 23.2.x or dev@09e59b9bc15eeedd3a5fce18a97903da2b39a4c5

What went wrong?

Steady uptick in produce, E2E latency and reactor utilization in longer-running benchmarks.

What should have happened instead?

Latency should be mostly flat or change with a very small slope (e.g., less than 1 ms over many hours in average or p50) and limited total increase.

How to reproduce the issue?

  1. Run an OMB test with 30/30 producers consumers @ 500 MB/s.
  2. Note latency increase in p50 and avg metrics.

Additional information

So for example a typical increase in a test from @ballard26 :

image

JIRA Link: CORE-1328

dotnwat commented 1 year ago

Anything else increasing in correlation like memory usage or cross-core memory frees etc...?

travisdowns commented 1 year ago

Reactor utilization also goes up, though per this comment it seems like that might have a separate cause related to segment size (i.e., larger segment sizes makes it reactor utilization flat instead of increasing but latency still increases).

Good call on checking other correlations.

dotnwat commented 1 year ago

cc @dlex too since they've been doing lots of produce throttling work lately and generally looking at this area of the code base.