Closed CrownedPhoenix closed 1 month ago
Thanks! Will be a little bit of time before being able to review properly - mostly out this week - but one quick comment: do you think it would make sense to support ranges for int/doubles too?
do you think it would make sense to support ranges for int/doubles too?
You're thinking something like this?
parameters: ["foo": 0...4, "bar": 1.0...10.0]
I don't see why we couldn't support something like that.
In practice, I wouldn't expect the implementer to actually use the stored parameter values directly (like they do for scaledIterations
). Rather, they would be generating the parameters first and then initializing the benchmark - such that they will already have access to the parameters directly via closure capture.
So likely they could get the behavior they want by just string encoding the range and using that string as the value.
For example:
for range in [0...1, 1...10, 10...100] {
Benchmark(
"testFoo",
parameters: [
BenchmarkParameter("range", String(describing: range) )
]
) { /* do something with `range` */ }
}
In summary, I'm certainly not opposed - though I'm not sure about the value proposition yet?
Admittedly, my argument here begs the question why we bother supporting anything besides String
at all.
Maybe BenchmarkParameter
could just have a generic init over CustomStringConvertible
.
The only downside to this is that there are probably some instances where knowing if the value is numeric vs string/complex is valuable. So maybe replacing the String init with a CustomStringConvertible parameter and maintaining the float and int initializers as well would be suitable.
Attention: Patch coverage is 95.55556%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 65.67%. Comparing base (
75e8622
) to head (5c050b5
).
Files with missing lines | Patch % | Lines |
---|---|---|
Sources/Benchmark/Benchmark.swift | 87.50% | 2 Missing :warning: |
Hi, started to have a look at the PR now - I'd like to revert the discussion to the use site a bit, if this was the old approach to paramterization:
let parameterization = (0...5).map { 1 << $0 } // 1, 2, 4, ...
parameterization.forEach { count in
Benchmark("ParameterizedWith\(count)") { benchmark in
for _ in 0 ..< count {
blackHole(Int.random(in: benchmark.scaledIterations))
}
}
}
Then with the additional API, I would expect to do just the minimal change:
parameterization.forEach { count in
Benchmark("ParameterizedTest", configuration: .init(parameters: [.init("count", count)])) { benchmark in
for _ in 0 ..< count {
blackHole(Int.random(in: benchmark.scaledIterations))
}
}
}
To basically record the parameters in use in the new parameters array - instead of encoding it into the name as in the previous current sample - for use in display and later in the export.
It's not a literal in this case, but a parameter that can vary - just as the currently supported parameterisation (just formalising the naming of the parameter that can vary for display/export purposes).
Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?
Also, writing the code above, perhaps it's really a dictionary that makes sense rather than an array (of string:string)?
E.g.
parameterization.forEach { count in
Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : count.description])) { benchmark in
for _ in 0 ..< count {
blackHole(Int.random(in: benchmark.scaledIterations))
}
}
}
Just to avoid adding multiple entries for the same key. And if we go this way, the value is just a string description of the parameter that was in use - as this will not be used in practice by the code - the actual parameter value is captured when the benchmark is created, so it can be any type then?
Then a range works too;
let range = 1...100
Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : range.description])) { benchmark in
for _ in range {
blackHole(Int.random(in: benchmark.scaledIterations))
}
}
}
I agree with everything you've outlined.
I think the conclusion of your point is that you prefer a parameters [String: String]
dictionary. I think that's perfectly fine so I will make the appropriate changes.
Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?
This is worth talking about as it is something that will be crucial for the Influx exporter.
In the world of Influx there are two different ways that a particular data point can be annotated.
A data point can be annotated with either a field
or a tag
.
Both of these are a (key: String, value: (String | Numeric) )
.
I'll define a couple terms here for the sake of the story:
key
. It's always a single value.value
s that can occur for that annotation.Now, the important distinction between tag
s and field
s is in the way they are stored in the Influx database - tag
s are indexed and field
s are not.
The consequence of this distinction is that:
tag
(to leverage the index)field
(so as not to pollute the index space and cause performance issues related to cardinality - see https://docs.influxdata.com/influxdb/v2/write-data/best-practices/resolve-high-cardinality/ and https://docs.influxdata.com/influxdb/v2/write-data/best-practices/schema-design/#use-fields-for-unique-and-numeric-data)In summary, whatever the design for parameterization - it's important for at least my use case to have information about whether the parameters are finite or infinite range. I'm totally open to suggestions on how we might designate this better.
Alright - I think we should also actually go back to your original proposal with a tags
naming convention then if we go this way- if it is just a way to mark up benchmarks (any real parameterisation is captured anyway) for display / export - the actual benchmark code will typically never read or use those tags for anything at runtime, so calling it parameters is just confusing then - sorry about the detour.
I don't think we want to leak details of influx formats though - I would probably suggest that you in that case adopt a convention on tags (either key naming or similar) that is used by the influx exporter - or consider a post processing step external that marks up the rows using such a convention/configuration after exporting the benchmark data?
I agree with not leaking Influx related details.
I'll return this to the tags nomenclature.
I'll have to keep thinking about how to do the disambiguation of tags/fields in a post processing step or otherwise.
It seems equally awkward to have a special implicit naming convention that only the influx exporter knows what to do with.
I'll think about it some more.
I would probably do a post processing step in your pipeline to handle that separately.
Alright, I made the changes we discussed to put the nomenclature back to tag
(instead of parameters
).
Could you elaborate a bit on what the post-processing step you are suggesting might look like?
1) Are you suggesting I add code to the InfluxExporter in this repo that post-processes BenchmarkResult
s in order to output the proper Influx data (which would require implicit prefixes on the tags used in the Benchmark)?
2) Or do you mean that I should post-process the output of the InfluxExporter and convert select tag
s to field
s as appropriate?
Neither one of these is ideal. # 2 is what I believe you are suggesting and I think it's doable - just means some more scripting in my CI pipeline.
Was hoping we could get to a point where the InfluxExporter native to this repo more fully supports Influx outputs. But maybe there just really isn't an appropriate generalization that is still relevant to the other exporters.
What do you think about a mechanism (I'd make a separate MR for this) for injecting exporter-specific information for a Benchmark into the Benchmark initializer?
Something like:
Benchmark(
/* params */,
exportConfigs: [ // [ExportConfiguration] where `ExportConfiguration` is a protocol
// This is where I could specify what `tag`s should be considered as `field`s during export
InfluxExportConfiguration(...)
]
)
OR
Benchmark(
/* params */,
exportConfigs: [
// This would better ensure no duplicate configs get passed.
.influx: InfluxExportConfiguration(...)
]
)
This would allow the Benchmark implementer to specify some salient information about a benchmark to the exporter that may only be relevant to that may only be relevant to that particular exporter. This would help get around your concern (which I think is very valid) of leaking Influx specific information into the base Benchmark config.
I was thinking of option #2, yep - I assume it would be a fairly straightforward script step - but I think we could also consider a separate PR for export configurations as you suggest (second one seems to make more sense then), e.g.
Benchmark(
/* params */,
exportConfigurations: [
.influx: InfluxExportConfiguration(...)
]
)
Awesome - I'll work on a followup PR after this one. I think this is ready for the final review.
There are some CI checks that have been failing that are not clear to me how to address - I'd appreciate any pointers you have on running these checks locally.
BenchmarkTool.swift needs to use the new baseName
instead at line 271:
if try shouldIncludeBenchmark(benchmark.baseName) {
(and then baseName needs to be public).
Otherwise a benchmark using tags will not properly match the regex matching.
E.g. I tried adding tags to the Basic benchmark in the Benchmarks directory thus:
Benchmark("Basic",
configuration: .init(metrics: [.wallClock, .throughput, .instructions], tags: ["a" : "b", "c" : "d"])) { _ in
}
Then this broke:
swift package benchmark --target Basic --filter "Basic"
and required this:
swift package benchmark --target Basic --filter "Basic.*"
to match.
Changing the regex matching to baseName instead sorts that.
The SwiftLint should be run locally (you need to kill the .build directory first):
hassila@ice ~/package-benchmark (feat(minor)/benchmark-parameters)> swiftlint .
Linting Swift files at paths .
Linting 'Int+Extensions.swift' (1/24)
Linting 'Benchmark.swift' (2/24)
Linting 'Statistics.swift' (3/24)
Linting 'BenchmarkRunner+ReadWrite.swift' (4/24)
Linting 'ARCStatsProducer.swift' (5/24)
Linting 'ARCStats.swift' (6/24)
Linting 'BenchmarkThresholds.swift' (7/24)
Linting 'BenchmarkRunner.swift' (9/24)
Linting 'Benchmark+ConvenienceInitializers.swift' (8/24)
Linting 'BenchmarkResult.swift' (10/24)
Linting 'BenchmarkInternals.swift' (12/24)
Linting 'BenchmarkMetric+Defaults.swift' (13/24)
Linting 'BenchmarkThresholds+Defaults.swift' (11/24)
Linting 'BenchmarkExecutor+Extensions.swift' (14/24)
Linting 'Blackhole.swift' (15/24)
Linting 'MallocStats.swift' (16/24)
Linting 'MallocStatsProducer+jemalloc.swift' (17/24)
Linting 'MallocStats+jemalloc-support.swift' (18/24)
Linting 'BenchmarkClock.swift' (19/24)
Linting 'OperatingSystemStatsProducer+Darwin.swift' (20/24)
Linting 'OperatingSystemStats.swift' (21/24)
Linting 'OperatingSystemStatsProducer+Linux.swift' (22/24)
Linting 'BenchmarkExecutor.swift' (23/24)
Linting 'BenchmarkMetric.swift' (24/24)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:36: warning: Blanket Disable Command Violation: The disabled 'type_body_length' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:23: warning: Blanket Disable Command Violation: The disabled 'file_length,' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:475:1: warning: File Length Violation: File should contain 400 lines or less: currently contains 475 (file_length)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:20:55: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:24:60: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:28:63: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:32:68: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:52: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:436:37: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:447:36: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
Linting 'P90AbsoluteThresholds.swift' (1/13)
Linting 'DateTime.swift' (2/13)
Linting 'Basic+SetupTeardown.swift' (3/13)
Linting 'BenchmarkRunner+Basic.swift' (4/13)
Linting 'Histogram.swift' (5/13)
Linting 'Package.swift' (6/13)
Linting 'BenchmarkTests.swift' (7/13)
Linting 'OperatingSystemAndMallocTests.swift' (8/13)
Linting 'BenchmarkMetricsTests.swift' (9/13)
Linting 'StatisticsTests.swift' (10/13)
Linting 'AdditionalTests.swift' (11/13)
Linting 'BenchmarkResultTests.swift' (12/13)
Linting 'BenchmarkRunnerTests.swift' (13/13)
Description
Sometimes it is convenient to run the same benchmark multiple times under slightly different scenarios. For example, benchmarking
func foo(_ n: Int)
for various parameterizations ofn
. This adds support for specifying the parameters of a benchmark but makes no functional changes beyond the name/identifier of a benchmark now including a parameter list when one was provided.This unlocks the potential for the benchmark exporters (such as the Influx exporter) to leverage these parameters (by emitting appropriate tags/fields in the case of Influx) in their output.
Resolves https://github.com/ordo-one/package-benchmark/issues/267
How Has This Been Tested?
Added a new test to verify the output of
Benchmark.name
. Also ran the remaining tests to make sure no other behavior changed.Minimal checklist:
DocC
code-level documentation for any public interfaces exported by the package