open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/
Apache License 2.0
5.12k stars 1.03k forks source link

Add continuous benchmarks #4537

Open MadVikingGod opened 11 months ago

MadVikingGod commented 11 months ago

Problem Statement

I want to understand how the go SDK's performance has changed over time.

Proposed Solution

Use the newly available dedicated environments in the Otel Org to run benchmarks automatically.

Tasks to make this happen

[1]: Currently, there are approximately 300 benchmarks with 3 data points each (ns/op, allocations/op, and B/op). The current tools will make a graph for each one, which is overwhelming. [2]: This can be done via:

git checkout --orphan benchmarks
git commit --allow-empty -m "Init"
git push origin benchmarks

Other Notes

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

dmathieu commented 11 months ago

See also https://github.com/open-telemetry/opentelemetry-go/issues/2791

We already run go test -bench=. in a GH Action on pushed to main. https://github.com/open-telemetry/opentelemetry-go/blob/main/.github/workflows/benchmark.yml

However, that run is rather unreliable, as the runs may have noisy neighbors. Switching to a dedicated environment would definitely help.

XSAM commented 2 months ago

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

dmathieu commented 2 months ago

Also note that pushing the benchmark results to GitHub pages seems like outside the acceptable use for GitHub apps/extensions, since bots should only be reading data, not writing to the repository.

https://github.com/open-telemetry/community/blob/main/docs/using-github-extensions.md

MadVikingGod commented 2 months ago

This can't be run against every PR. There isn't a way to securely run against a PR and not have it be open for exploration. Instead, this should be run against releases.

@MadVikingGod, I missed the context here. Could you please elaborate on the security issue of running a benchmark CI against every PR? Thank you

We accept PRs from everyone, and because those PRs can change what commands are run on the dedicated hardware, it is a security risk to have them run on every PR. GitHub does not recommend using dedicated runners with a public repository

XSAM commented 2 months ago

@MadVikingGod, is it OK if I assign this to myself? I want to push this issue.

XSAM commented 2 months ago

To make the continuous benchmark work, we at least need to solve the following issues:

Proposal:


Other optional goals:

dmathieu commented 2 months ago

Enable self-hosted runner to have a dedicated environment to run benchmarks and gain stable results.

There has been several discussions about this at the community level, to set this up in an automated way where no single human would have the keys. No definitive solution arose yet (the closest one would be actuated, but it's not dedicated runners AFAIK). Related: https://github.com/open-telemetry/community/issues/1162

XSAM commented 2 months ago

IIUC, we have two types of github runners available to use:

Related issue:


I can try actuated's machines first to see if they can produce stable benchmark results. If it doesn't, I can try self-hosted as a backup.

yurishkuro commented 2 months ago

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

XSAM commented 2 months ago

The OTel collector uses this type of runner for testing

testing, but not benchmarks?

Yes, not benchmark, only testing. https://github.com/open-telemetry/opentelemetry-collector/blob/227fb823dbf33e6a62b842aef5a70e10ed1db84f/.github/workflows/build-and-test-arm.yml#L27

IIUC, the collector CI is not running any benchmark.

XSAM commented 1 month ago

Update: The actuated runner cannot run for the forked repository https://github.com/XSAM/opentelemetry-rust/actions/runs/9968339009/job/27543447028 (it will wait for a runner to pick up this job forever)

XSAM commented 1 month ago

Tasks

MrAlias commented 1 month ago

@XSAM the resource limits I was thinking about in the SIG meeting were actually just Go flags: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/4301d25c8fbe97a8153eca91acc370b990011af2/.github/workflows/build-and-test-windows.yml#L53-L54

Not sure that will have an impact on the consistency of benchmarks. You can likely just disregard.

XSAM commented 3 weeks ago

benchmark-action does not recognize the Go package name; thus, it would consider benchmarks with the same name from different packages as identical benchmarks. This causes the result of benchmark comparison on the summary to be incorrect, as it would compare the wrong benchmark.

For example, the BenchmarkBool/AsBool from https://github.com/open-telemetry/opentelemetry-go/actions/runs/10424262745 action shows twice.

Benchmark suite Current: 2db4ef2c032cd2390d18d01feeb37b71a7ca9469 Previous: f5872418a6297a9b0b2806b84d3d66707b1a0e37 Ratio
BenchmarkBool/AsBool 0.9276 ns/op 0 B/op 0 allocs/op 1.001 ns/op 0 B/op 0 allocs/op 0.93
BenchmarkBool/AsBool 2.087 ns/op 0 B/op 0 allocs/op 1.001 ns/op 0 B/op 0 allocs/op 2.08

It might be worth introducing a group field to the result of benchmark-action.