ordo-one / package-benchmark

Swift benchmark runner with many performance metrics and great CI support
Apache License 2.0
313 stars 25 forks source link

feat(minor): Add Benchmark parameters #274

Closed CrownedPhoenix closed 2 days ago

CrownedPhoenix commented 2 weeks ago

Description

Sometimes it is convenient to run the same benchmark multiple times under slightly different scenarios. For example, benchmarking func foo(_ n: Int) for various parameterizations of n. This adds support for specifying the parameters of a benchmark but makes no functional changes beyond the name/identifier of a benchmark now including a parameter list when one was provided.

This unlocks the potential for the benchmark exporters (such as the Influx exporter) to leverage these parameters (by emitting appropriate tags/fields in the case of Influx) in their output.

Resolves https://github.com/ordo-one/package-benchmark/issues/267

How Has This Been Tested?

Added a new test to verify the output of Benchmark.name. Also ran the remaining tests to make sure no other behavior changed.

Minimal checklist:

hassila commented 2 weeks ago

Thanks! Will be a little bit of time before being able to review properly - mostly out this week - but one quick comment: do you think it would make sense to support ranges for int/doubles too?

CrownedPhoenix commented 2 weeks ago

do you think it would make sense to support ranges for int/doubles too?

You're thinking something like this?

parameters: ["foo":  0...4, "bar": 1.0...10.0]

I don't see why we couldn't support something like that. In practice, I wouldn't expect the implementer to actually use the stored parameter values directly (like they do for scaledIterations). Rather, they would be generating the parameters first and then initializing the benchmark - such that they will already have access to the parameters directly via closure capture.

So likely they could get the behavior they want by just string encoding the range and using that string as the value.

For example:

for range in [0...1, 1...10, 10...100] {
    Benchmark(
        "testFoo",
        parameters: [
            BenchmarkParameter("range", String(describing: range) )
        ]
    ) {  /* do something with `range` */  }
}

In summary, I'm certainly not opposed - though I'm not sure about the value proposition yet?

Admittedly, my argument here begs the question why we bother supporting anything besides String at all. Maybe BenchmarkParameter could just have a generic init over CustomStringConvertible. The only downside to this is that there are probably some instances where knowing if the value is numeric vs string/complex is valuable. So maybe replacing the String init with a CustomStringConvertible parameter and maintaining the float and int initializers as well would be suitable.

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 65.67%. Comparing base (75e8622) to head (5c050b5).

Files with missing lines Patch % Lines
Sources/Benchmark/Benchmark.swift 87.50% 2 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274/graphs/tree.svg?width=650&height=150&src=pr&token=hXHmhEG1iF&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one)](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one) ```diff @@ Coverage Diff @@ ## main #274 +/- ## ========================================== + Coverage 65.32% 65.67% +0.35% ========================================== Files 33 34 +1 Lines 3610 3653 +43 ========================================== + Hits 2358 2399 +41 - Misses 1252 1254 +2 ``` | [Files with missing lines](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one) | Coverage Δ | | |---|---|---| | [Sources/Benchmark/BenchmarkParameter.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Sources%2FBenchmark%2FBenchmarkParameter.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-U291cmNlcy9CZW5jaG1hcmsvQmVuY2htYXJrUGFyYW1ldGVyLnN3aWZ0) | `100.00% <100.00%> (ø)` | | | [Tests/BenchmarkTests/BenchmarkTests.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Tests%2FBenchmarkTests%2FBenchmarkTests.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-VGVzdHMvQmVuY2htYXJrVGVzdHMvQmVuY2htYXJrVGVzdHMuc3dpZnQ=) | `100.00% <100.00%> (ø)` | | | [Sources/Benchmark/Benchmark.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Sources%2FBenchmark%2FBenchmark.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-U291cmNlcy9CZW5jaG1hcmsvQmVuY2htYXJrLnN3aWZ0) | `69.17% <87.50%> (+2.19%)` | :arrow_up: | | [Files with missing lines](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one) | Coverage Δ | | |---|---|---| | [Sources/Benchmark/BenchmarkParameter.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Sources%2FBenchmark%2FBenchmarkParameter.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-U291cmNlcy9CZW5jaG1hcmsvQmVuY2htYXJrUGFyYW1ldGVyLnN3aWZ0) | `100.00% <100.00%> (ø)` | | | [Tests/BenchmarkTests/BenchmarkTests.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Tests%2FBenchmarkTests%2FBenchmarkTests.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-VGVzdHMvQmVuY2htYXJrVGVzdHMvQmVuY2htYXJrVGVzdHMuc3dpZnQ=) | `100.00% <100.00%> (ø)` | | | [Sources/Benchmark/Benchmark.swift](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?src=pr&el=tree&filepath=Sources%2FBenchmark%2FBenchmark.swift&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one#diff-U291cmNlcy9CZW5jaG1hcmsvQmVuY2htYXJrLnN3aWZ0) | `69.17% <87.50%> (+2.19%)` | :arrow_up: | ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one). Last update [75e8622...5c050b5](https://app.codecov.io/gh/ordo-one/package-benchmark/pull/274?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ordo-one).
hassila commented 1 week ago

Hi, started to have a look at the PR now - I'd like to revert the discussion to the use site a bit, if this was the old approach to paramterization:

    let parameterization = (0...5).map { 1 << $0 } // 1, 2, 4, ...

    parameterization.forEach { count in
        Benchmark("ParameterizedWith\(count)") { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

Then with the additional API, I would expect to do just the minimal change:

    parameterization.forEach { count in
        Benchmark("ParameterizedTest", configuration: .init(parameters: [.init("count", count)])) { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

To basically record the parameters in use in the new parameters array - instead of encoding it into the name as in the previous current sample - for use in display and later in the export.

It's not a literal in this case, but a parameter that can vary - just as the currently supported parameterisation (just formalising the naming of the parameter that can vary for display/export purposes).

Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?

Also, writing the code above, perhaps it's really a dictionary that makes sense rather than an array (of string:string)?

E.g.

    parameterization.forEach { count in
        Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : count.description])) { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

Just to avoid adding multiple entries for the same key. And if we go this way, the value is just a string description of the parameter that was in use - as this will not be used in practice by the code - the actual parameter value is captured when the benchmark is created, so it can be any type then?

Then a range works too;

  let range = 1...100
        Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : range.description])) { benchmark in
            for _ in range {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }
CrownedPhoenix commented 1 week ago

I agree with everything you've outlined. I think the conclusion of your point is that you prefer a parameters [String: String] dictionary. I think that's perfectly fine so I will make the appropriate changes.

Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?

This is worth talking about as it is something that will be crucial for the Influx exporter. In the world of Influx there are two different ways that a particular data point can be annotated. A data point can be annotated with either a field or a tag. Both of these are a (key: String, value: (String | Numeric) ). I'll define a couple terms here for the sake of the story:

Now, the important distinction between tags and fields is in the way they are stored in the Influx database - tags are indexed and fields are not.

The consequence of this distinction is that:

In summary, whatever the design for parameterization - it's important for at least my use case to have information about whether the parameters are finite or infinite range. I'm totally open to suggestions on how we might designate this better.

hassila commented 1 week ago

Alright - I think we should also actually go back to your original proposal with a tags naming convention then if we go this way- if it is just a way to mark up benchmarks (any real parameterisation is captured anyway) for display / export - the actual benchmark code will typically never read or use those tags for anything at runtime, so calling it parameters is just confusing then - sorry about the detour.

I don't think we want to leak details of influx formats though - I would probably suggest that you in that case adopt a convention on tags (either key naming or similar) that is used by the influx exporter - or consider a post processing step external that marks up the rows using such a convention/configuration after exporting the benchmark data?

CrownedPhoenix commented 1 week ago

I agree with not leaking Influx related details.

I'll return this to the tags nomenclature.

I'll have to keep thinking about how to do the disambiguation of tags/fields in a post processing step or otherwise.

It seems equally awkward to have a special implicit naming convention that only the influx exporter knows what to do with.

I'll think about it some more.

hassila commented 1 week ago

I would probably do a post processing step in your pipeline to handle that separately.

CrownedPhoenix commented 3 days ago

Alright, I made the changes we discussed to put the nomenclature back to tag (instead of parameters).

Could you elaborate a bit on what the post-processing step you are suggesting might look like?

1) Are you suggesting I add code to the InfluxExporter in this repo that post-processes BenchmarkResults in order to output the proper Influx data (which would require implicit prefixes on the tags used in the Benchmark)?

2) Or do you mean that I should post-process the output of the InfluxExporter and convert select tags to fields as appropriate?

Neither one of these is ideal. # 2 is what I believe you are suggesting and I think it's doable - just means some more scripting in my CI pipeline.

Was hoping we could get to a point where the InfluxExporter native to this repo more fully supports Influx outputs. But maybe there just really isn't an appropriate generalization that is still relevant to the other exporters.

What do you think about a mechanism (I'd make a separate MR for this) for injecting exporter-specific information for a Benchmark into the Benchmark initializer?

Something like:

Benchmark(
    /* params */,
    exportConfigs: [ // [ExportConfiguration] where `ExportConfiguration` is a protocol
        // This is where I could specify what `tag`s should be considered as `field`s during export
        InfluxExportConfiguration(...)
    ]
)

OR

Benchmark(
    /* params */,
    exportConfigs: [ 
        // This would better ensure no duplicate configs get passed.
        .influx: InfluxExportConfiguration(...)
    ]
)

This would allow the Benchmark implementer to specify some salient information about a benchmark to the exporter that may only be relevant to that may only be relevant to that particular exporter. This would help get around your concern (which I think is very valid) of leaking Influx specific information into the base Benchmark config.

hassila commented 2 days ago

I was thinking of option #2, yep - I assume it would be a fairly straightforward script step - but I think we could also consider a separate PR for export configurations as you suggest (second one seems to make more sense then), e.g.

Benchmark(
    /* params */,
    exportConfigurations: [ 
        .influx: InfluxExportConfiguration(...)
    ]
)
CrownedPhoenix commented 2 days ago

Awesome - I'll work on a followup PR after this one. I think this is ready for the final review.

There are some CI checks that have been failing that are not clear to me how to address - I'd appreciate any pointers you have on running these checks locally.

hassila commented 2 days ago

BenchmarkTool.swift needs to use the new baseName instead at line 271:

            if try shouldIncludeBenchmark(benchmark.baseName) {

(and then baseName needs to be public).

Otherwise a benchmark using tags will not properly match the regex matching.

E.g. I tried adding tags to the Basic benchmark in the Benchmarks directory thus:

    Benchmark("Basic",
              configuration: .init(metrics: [.wallClock, .throughput, .instructions], tags: ["a" : "b", "c" : "d"])) { _ in
    }

Then this broke:

swift package benchmark --target Basic --filter "Basic"

and required this:

swift package benchmark --target Basic --filter "Basic.*"

to match.

Changing the regex matching to baseName instead sorts that.

hassila commented 2 days ago

The SwiftLint should be run locally (you need to kill the .build directory first):

hassila@ice ~/package-benchmark (feat(minor)/benchmark-parameters)> swiftlint .
Linting Swift files at paths .
Linting 'Int+Extensions.swift' (1/24)
Linting 'Benchmark.swift' (2/24)
Linting 'Statistics.swift' (3/24)
Linting 'BenchmarkRunner+ReadWrite.swift' (4/24)
Linting 'ARCStatsProducer.swift' (5/24)
Linting 'ARCStats.swift' (6/24)
Linting 'BenchmarkThresholds.swift' (7/24)
Linting 'BenchmarkRunner.swift' (9/24)
Linting 'Benchmark+ConvenienceInitializers.swift' (8/24)
Linting 'BenchmarkResult.swift' (10/24)
Linting 'BenchmarkInternals.swift' (12/24)
Linting 'BenchmarkMetric+Defaults.swift' (13/24)
Linting 'BenchmarkThresholds+Defaults.swift' (11/24)
Linting 'BenchmarkExecutor+Extensions.swift' (14/24)
Linting 'Blackhole.swift' (15/24)
Linting 'MallocStats.swift' (16/24)
Linting 'MallocStatsProducer+jemalloc.swift' (17/24)
Linting 'MallocStats+jemalloc-support.swift' (18/24)
Linting 'BenchmarkClock.swift' (19/24)
Linting 'OperatingSystemStatsProducer+Darwin.swift' (20/24)
Linting 'OperatingSystemStats.swift' (21/24)
Linting 'OperatingSystemStatsProducer+Linux.swift' (22/24)
Linting 'BenchmarkExecutor.swift' (23/24)
Linting 'BenchmarkMetric.swift' (24/24)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:36: warning: Blanket Disable Command Violation: The disabled 'type_body_length' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:23: warning: Blanket Disable Command Violation: The disabled 'file_length,' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:475:1: warning: File Length Violation: File should contain 400 lines or less: currently contains 475 (file_length)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:20:55: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:24:60: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:28:63: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:32:68: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:52: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:436:37: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:447:36: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
Linting 'P90AbsoluteThresholds.swift' (1/13)
Linting 'DateTime.swift' (2/13)
Linting 'Basic+SetupTeardown.swift' (3/13)
Linting 'BenchmarkRunner+Basic.swift' (4/13)
Linting 'Histogram.swift' (5/13)
Linting 'Package.swift' (6/13)
Linting 'BenchmarkTests.swift' (7/13)
Linting 'OperatingSystemAndMallocTests.swift' (8/13)
Linting 'BenchmarkMetricsTests.swift' (9/13)
Linting 'StatisticsTests.swift' (10/13)
Linting 'AdditionalTests.swift' (11/13)
Linting 'BenchmarkResultTests.swift' (12/13)
Linting 'BenchmarkRunnerTests.swift' (13/13)