spotify / heroic

The Heroic Time Series Database
https://spotify.github.io/heroic/
Apache License 2.0
848 stars 109 forks source link

Find/Create BigTable Load Simulator #711

Open sming opened 4 years ago

sming commented 4 years ago

From #bigtable:

Hi BigTable folks,

I want to analyze how Heroic behaves when BigTable is under severe load. As I understand it, Heroic experiences this as very long response times or time-outs and that's it.

• is that last statement correct or is there another way that BigTable exposes degraded performance to clients (e.g. 5xx errors)? • what tools (e.g. the BigTable emulator) are available to perform such analysis? • do you have any tips for responding to degraded BigTable performance? ... TL;DR: seemingly a couple of hot tablet servers utterly cripple the entire cluster's IO throughput. Cluster CPU also hits the floor, except for the 2 hot tablet servers.

sming commented 3 years ago

Q1 2021 Update

So whilst there were some suggestions from #bigtable, nothing usable was surfaced. Then after some time, Adam Steele (adamsteele@google.com) presented his fork of the BT emulator, which supports jittered delayed responses which suits our needs perfectly.

This has been trialled (code is currently in a patch file) and seems to work as advertised, which is great news.

The Heroic API Request Timeouts & Retries Gdoc contains the work that should see this emulator used in unit test/integration test's.