vitessio / arewefastyet

Automated Benchmarking System for Vitess
https://benchmark.vitess.io
Apache License 2.0
74 stars 58 forks source link

Fix lack of Prometheus metrics and improve reliability of collected metrics #519

Closed frouioui closed 6 months ago

frouioui commented 6 months ago

We used to re-start the Prometheus service between each benchmark (whether it was the same one or not) which led to Prometheus replaying the WAL after every restart and being unavailable to collect metrics during this time. This issue was not very visible before #517 as benchmarks were long, but since benchmarks run fairly fast now, the issue is obvious and leads to benchmarks having 0 metrics collected. Also this issue is not easily reproducible on the dev host machine as the whole setup of benchmark is a lot longer there.

With this PR we are not keeping Prometheus running all the time on the host machine and using hot reloading of the configuration files to change the exec_uuid label.