ordo-one / package-benchmark

Swift benchmark runner with many performance metrics and great CI support
Apache License 2.0
326 stars 25 forks source link

Feature: Custom benchmark tags #267

Closed CrownedPhoenix closed 1 month ago

CrownedPhoenix commented 2 months ago

I want to test the waters on whether the community would be receptive to a PR that adds the ability to apply custom tags to a benchmark. This specifically targets a use case with Influx where I want to run the same benchmark multiple times under different constraints/scenarios.

For example:

for size in [10, 100, 1000] {
    Benchmark("TestFoo", tags: [ "size": size ]) {
        foo(size)
    }
}

This would allow me to get better dimensionality on a benchmark that, when exported to Influx, is more easily queried etc. This would require some amendments to the Influx format exporter to include these custom tags for the particular benchmark.

We (myself and/or @ORyanHampton) are happy to take this PR on - just don't want to spend time on it if it's not a direction that the community finds appropriate.

hassila commented 2 months ago

I think that is a good direction for packaging of parameterised benchmarks - we currently do it by changing the benchmark description string, e.g.:

let benchmarks = {
  let parameterization = (0...5).map { 1 << $0 } // 1, 2, 4, ...

  parameterization.forEach { count in
    Benchmark("ParameterizedWith\(count)") { benchmark in
      for _ in 0 ..< count {
        blackHole(Int.random(in: benchmark.scaledIterations))
      }
    }
  }
}

but to give an exporter access to better structured information of the parameterization in use as you suggest would be nicer and avoid string parsing.

Some considerations/questions would be:

Overall I think it would be a nice improvement!

CrownedPhoenix commented 2 months ago

Need to validate behaviour with multiple benchmarks with the same name (including baselines, filtering etc) - haven't really tried or considered that - the expectation has been that a benchmark name is unique, so there may be some assumptions there that breaks things

This makes sense. What are the odds we could just give each Benchmark a default unique uuid? User doesn't have to explicitly provide one (or maybe it's not even parameterized)? I assume there would just need to be a mapping back to the string-based name - which with your suggestion below modifying the textual output would definitely be useful in disambiguating when outputting a description of the benchmark.

Should adopt the standard textual output to also print parameterisation in a nice compact way (perhaps simply TestFoo (size = 100, otherParameter = "test") is ok)

Love this.

What are thoughts about supported types and cardinality for such parameterisation?

It's a good question. Crossed my mind while writing the example code above. I'm thinking about something like this to give some flexibility to the user like what you have shown above:

enum Value { // Not necessarily the ideal naming
  case int(Int)
  case string(String)
}
extension Value: ExpressibleByStringLiteral {}
extension Value: ExpressibleByIntegerLiteral {}

// Then this can work pretty transparently.
let parameterization: [String: Value] = ["size": 100, "otherParameter": "test"]

Thoughts?

Perhaps "parameterization" would be a better term than "tags" to better reflect what it actually is used for (then we can reserve "tags" or "labels" for a future feature that could be used for metadata filtering of benchmarks to run by "tag" or "label", WDYT?

Sure - this is excellent feedback as one of my concerns was how this feature would fit within the ecosystems of the other exporters. On this note, in Influx land, in addition to tags there are fields and I'm wondering if there's a generalization that could be applied to facilitate both of these. I do believe I'm going to need both to do what I want to do. The tags I anticipate using for the static and constrained metadata about a particular test - whereas the fields I will need to use for the more dynamic metadata about the test.

For example, I want to do a bit of randomization in my Benchmarking but want to record the results of that randomization along with the test. Specifically, the randomized values would be a double between 0.0 and 1.0 (a percentage). This wouldn't fit well as a tag because there are infinite values between 0.0 and 1.0 - which is where a field is more useful. Importantly, this value is not a measurement so a custom metric wouldn't be appropriate here.

Do you have any recommendations or preferences about a generalization around the notion of "parameterizations" with finite ranges vs infinite ranges.