ordo-one / package-benchmark

Swift benchmark runner with many performance metrics and great CI support
Apache License 2.0
313 stars 25 forks source link

Error enum values not propagated as exit codes for benchmark comparison in CI. #235

Open loonatick-src opened 6 months ago

loonatick-src commented 6 months ago

From the docs:

These baselines can then be checked with:

swift package benchmark baseline check --check-absolute-path /relative/or/absolute/path/to/Thresholds

The absolute check will have an exit code of 0 if the check is exactly equal, but will return with exit code 2 if there were any regressions or exit code 4 if there were only improvements.

But, this does not seem to be the case; this is what I saw.

$ swift package benchmark baseline check main merge_request              
Building for debugging...
[1/1] Write swift-version-6E0725D62189FA0A.txt
Build complete! (0.29s)
Building for debugging...
[1/1] Write swift-version-6E0725D62189FA0A.txt
Build complete! (0.66s)
Build complete!
Building BenchmarkTool in release mode...

=======================================================================================
Threshold deviations for test benchmarks:TestBenchmarks
=======================================================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (wall clock) (μs, %)                │            main │   merge_request │    Difference % │     Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p75                                      │             185 │             170 │              -8 │               5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛

New baseline 'merge_request' is BETTER than the 'main' baseline thresholds.

error: benchmarkThresholdImprovement
$ echo $?
1

The return code for improvement is 1, and from the docs I expected it to be 4 in the CI.

I see that the error enum's value is defined as 4 here, and it is thrown from performCommand.

I suspect that performCommand's caller might not be calling exit with the error enum's raw value as done here.

 Diagnostics.error(String(describing: error))
 exit(1)

This seems to be happening in the example benchmark comparison run. I couldn't completely trace the call across library boundaries. Is this a bug or am I missing something?

Environment

loonatick-src commented 6 months ago

Ah well I just saw that this has already an open issue upstream , if I correctly understand that this is the same problem.

hassila commented 6 months ago

Yeah, it's the same issue. It's possible to work around by running the benchmark tool directly without swiftpm - running the benchmark with --debug you can see the actual command line required to drive the benchmark tool and instead run that in ci, the you should get the correct error code surfaced.

loonatick-src commented 6 months ago

Is this also reflected in the docs? I could not find anything of the sort and was going entirely by the 0/2/4 exit codes for swift package benchmark baseline check.

hassila commented 6 months ago

Thanks for the report, unfortunately away for a couple of weeks and won't have time to check on this until then, sorry for delay.

loonatick-src commented 6 months ago

No worries! Also thank you for pointing out the workaround