Closed marcrasi closed 4 years ago
Hi @marcrasi, thanks a lot for the feedback! We were thinking of adding something along those lines recently. One potential design is to expose another overload of benchmark
where you could explicitly mark the "start/end" of the benchmarking code:
benchmark("my bench") { state in
//
// set-up goes here
//
state.start()
//
// we are only measuring this part
//
state.end()
//
// tear-down goes here
//
}
Do you think this would work for your use case?
Yes!
I encountered this previously as well; the approach I did was defined a lazy var (and use it in the benchmark). That way, if those benchmarks don't run, then I don't do very much work (just instantiating the class instance).
When they do run, I'm timing both the code I'd like to benchmark, as well as retrieving from the lazy var. (Because of warmup runs, the lazy var is "forced" during the setup run, and so don't count.) If the work I'm doing is super small, then the overheads of retrieving from the lazy var
matters, but in this case, the time spent in the closure is so completely dominated by the main work that I don't particularly care.
I liked your idea @shabalind , but had one improvement: if we really care about this kind of performance, I'd recommend we do something like the following:
benchmark("my super fast operation") { context in
// set up goes here
for _ in context {
mySuperFastOperation()
}
// cleanup here.
}
That way, we can get rid of the closure dispatch overhead and more. We just might need to make sure the Swift compiler doesn't get too smart and optimize away the loop body...
@saeta I think both approaches ca co-exist:
state.start()
, state.end()
are useful for expressing set up and tear down, but the benchmark loop is still run outside of the benchmark closure. This is useful for longer-running benchmarks that want to skip part of the per-iteration time from being recorded.
for _ in state { ... }
is great for moving the benchmark loop into the closure. This is great for short-running benchmarks that want to make sure that measurements don't get affected by the closure and/or protocol virtual dispatch overhead.
Additionally, @mattt also suggests another alternative in #33:
setUp
and tearDown
functions. It seems to be complementary to state.start()
, state.end()
, because it can't be used in closure-based benchmarks.
- Extend benchmark protocol, to include
setUp
andtearDown
functions. It seems to be complementary tostate.start()
,state.end()
, because it can't be used in closure-based benchmarks.
That's only true in its current implementation. You could add setUp
and tearDown
closure arguments to the benchmark
function (defaulting to nil
so that they don't have to be specified):
benchmark("...", setUp: { /* ... */ }, tearDown: { /* ... */ }) {
/* ... */
}
Or, if you wanted to target the forthcoming multiple trailing closures proposal in SE-0279, you'd implement a separate overload of benchmark
that requires all three closure arguments in a different order:
benchmark("...") {
/* ... */
} setUp: {
/* ... */
} tearDown: {
/* ... */
}
I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start()
without end()
or not including that for _ in context
.
benchmark("...", setUp: { /* ... */ }, tearDown: { /* ... */ }) { ... }
I find it to be less elegant than either start
/end
or for _ in state { ... }
, because it works poorly with respect to lexical scoping. For example:
benchmark(...) { state in
let resource = Resource()
resource.expensiveOperation()
state.start()
operationToBenchmark(resource)
}
Here due to lexical scoping everything is defined inside of the benchmark.
But if we had multiple closures:
let resource: Resource?
benchmark("...") {
operationToBenchmark(resource!)
} setUp: {
resource = Resource()
resource!.expensiveOperation()
}
Here resource
leaks to the outer scope for no good reason.
I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start() without end()
The semantics there would be that start/end are called automatically for you by the framework outside of the benchmark code, so omitting one or both of those is still valid and works as expected.
On the other hand, the equivalent with a full-fledged struct instance and setUp
and tearDown
methods is going to be well scoped as well:
struct MyBenchmark: AnyBenchmark {
let resource: Resource?
func setUp() {
resource = Resource()
resource!.expensiveOperation()
}
func run() {
operationToBenchmark(resource!)
}
}
benchmark("...", MyBenchmark())
On the other hand, the equivalent with a full-fledged struct instance and
setUp
andtearDown
methods is going to be well scoped as well.
Standard caveats for setUp
and tearDown
in XCTest applies here. Instance variables would either be declared as implicitly optional (var Resource!
), provided an initial value (var resource = Resource()
, or declared outside the benchmark entirely:
let resource = Resource()
resource.expensiveOperation()
benchmark {
nonMutatingOperation(resource)
}
A do
expression can reduce concerns about lexical scope:
do {
let resource = Resource()
resource.expensiveOperation()
benchmark {
nonMutatingOperation(resource)
}
}
do {
let resource = Resource()
resource.expensiveOperation()
benchmark {
anotherNonMutatingOperation(resource)
}
}
The semantics there would be that start/end are called automatically for you by the framework outside of the benchmark code, so omitting one or both of those is still valid and works as expected.
What should I expect if I were to...
start
multiple times?end
before start
?start
/ end
multiple times in an unbalanced way?I really think adding start
/ end
defeats the purposes of blocks, and I hope we can avoid it here.
I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start() without end() or not including that for _ in context.
Hmm, I think this is not so big of an issue, in the sense that the library can define a reasonable thing to do and trap if user code violates the assumptions. The following are some straw-person proposals for what we could do (that I haven't fully thought through, so feedback / counterproposals welcome!):
Option A (start exactly once):
.start()
must be called exactly once. If it is not called, or if it is called more than once, it is a fatalError and execution halts with a helpful error message..stop()
may be called once, or zero times. In the case of zero times, timing ends when the closure returns.Option B (start & stop exactly once):
.start()
and .stop()
must each be called exactly once. It is a fatal error to for the user's closure to interact with the timer in any other way. Option C (start & stop, and for loop):
.makeIterator()
called upon it, which starts timing. When the iterator completes, it stops timing. If makeIterator()
is called, it must be fully consumed by the user's closure.
b. The timer object has .start()
called upon it. (This must occur exactly once.) .stop()
can be optional, per proposal A, or it can be required, per proposal B.Since this is benchmark code, the source code is always available and mutable, and so I don't think it's a priority to statically prevent invalid use. Instead, I'd rather we optimize for an easy-to-use API. The goals of this (IMO) should be: (1) accurate benchmarks, (2) ease of authoring & maintaining benchmarks. On that latter point, I think it's a question of taste / style, but I think striking an appropriate balance between static guarantees and dynamic guarantees is better than all-static or all-dynamic.
Does that make sense? WDYT?
Note: one short-coming of the above proposals is that it disallows timing 2 performance critical operations with a non-performance critical operations in the middle. I think this is probably okay, but can be persuaded otherwise if we come up with some good use cases.
@saeta Without responding to any of your points individually, my general response is that both start()
/stop()
and for _ in context
introduce complexity without any discernible benefit.
I think this is not so big of an issue, in the sense that the library can define a reasonable thing to do and trap if user code violates the assumptions.
Good design is about not being able to use something incorrectly. Option<T>
is better than trapping on null
pointer errors. Blocks are better than calling start
and end
.
Now that #33 is merged, is there any functionality that's not currently addressed?
for _ in state
, only closure-based: state.measure { ... }
. Apart from giving as easy way to express set up and tear down while using closures, we can also get better precision because state.measure
's closure is non-escaping (and thus possible to optimize away completely).
I would like a way to set up some data before the timed portion of the benchmark runs.
Concrete use case: In https://github.com/borglab/SwiftFusion/pull/61, I add a benchmark that requires a dataset from the internet. I put the code that downloads the dataset outside of the benchmark, but this means that the dataset download happens even when the benchmark that requires it is filtered out.