Decide on performance benchmarking criteria

rakyll commented 3 years ago

The work we planned for Phase 1 will be mainly about stability and performance. In order to achieve our performance goals, we will need to track the improvements and regressions. We are considering to benchmark the entire Phase 1 pipeline (Prometheus receiver -> collector -> Prometheus remote write exporter) and potentially will contribute micro benchmarks as needed. We need to decide on what to benchmark, what platforms we should run the benchmarks on, and dimensions.

Prior work

Previously, we ran manual benchmarks on Kubernetes (EKS), on a cluster with 10 m5.8xlarge nodes. On Kubernetes, the collection scales based on how many jobs running in the entire cluster and how many metrics generated per job. The total number of jobs running in a cluster is capped by the resources available to the cluster. We used a simple app that exposes a lightweight HTTP server that publishes a given number of metrics. The metrics are collected by the OTEL Prometheus receiver and exported to Amazon Managed Service for Prometheus (AMP).

We published 40 160, 400 and 1000 metrics from each server and ran 25, 50, 100, 250 and 500 replicas of the server and measured resource usage, export rate (samples per second), dropped vs exported metric samples. The scraper is configured to scrape at 15 seconds and this is a more aggressive setting than what our users will use. Scraping frequency only became a bottleneck when 1000 metrics are exported from 50+ replicas.

This work mainly targeted Kubernetes and might perform differently on a platform with a Prometheus discovery driver.

rakyll commented 3 years ago

Issues that will require performance testing:

dashpole commented 3 years ago

Some other prior art are the existing collector testbed tests: https://github.com/open-telemetry/opentelemetry-collector/tree/main/testbed. It looks like they could be extended to support prometheus receivers/PRW exporters.

rakyll commented 3 years ago

@RichiH may provide some details from the benchmarking they are considering for Prometheus. We might be interested to compare the collector to Prometheus and Grafana Agent. We expect comparisons with Prometheus won't be useful given Prometheus' storage components and it'll be more useful to compare the collector to the Grafana Agent.

RichiH commented 3 years ago

I couldn't find explicit test results for a recent Grafana Agent, but let's go with Prometheus including storage, rule evaluation, and alerting so we have some baseline number. Two datapoints:

One internal Prometheus is currently doing 1,295,069 active series @ 15 second scrape interval with 1.3-1.6 cores. As a lower bound for Prometheus, this will not be wrong: 1,295,069 * 4 / 60 / 1.45 =~ 59,543.40 samples/second/core

The largest single Pometheus instance I am aware of has ~125,000,000 active series at a 60s scrape interval, coming out at ~2,000,000 samples/second/instance.

Again, all of those include storage, alerting, and querying, so the data is not very good, merely an indication.

alolita commented 3 years ago

We understand what performance benchmarking criteria need to be implemented. Closing this issue but implementation of the benchmarking tests still needs to be done in phase 2.

open-telemetry / wg-prometheus

Decide on performance benchmarking criteria #19

Prior work