marcrasi commented 4 years ago

I would like a way to set up some data before the timed portion of the benchmark runs.

Concrete use case: In https://github.com/borglab/SwiftFusion/pull/61, I add a benchmark that requires a dataset from the internet. I put the code that downloads the dataset outside of the benchmark, but this means that the dataset download happens even when the benchmark that requires it is filtered out.

shabalind commented 4 years ago

Hi @marcrasi, thanks a lot for the feedback! We were thinking of adding something along those lines recently. One potential design is to expose another overload of benchmark where you could explicitly mark the "start/end" of the benchmarking code:

benchmark("my bench") { state in
   //
   // set-up goes here
   //
   state.start()
   //
   // we are only measuring this part
   //
   state.end()
   //
   // tear-down goes here
   //
}

Do you think this would work for your use case?

marcrasi commented 4 years ago

Yes!

saeta commented 4 years ago

I encountered this previously as well; the approach I did was defined a lazy var (and use it in the benchmark). That way, if those benchmarks don't run, then I don't do very much work (just instantiating the class instance).

When they do run, I'm timing both the code I'd like to benchmark, as well as retrieving from the lazy var. (Because of warmup runs, the lazy var is "forced" during the setup run, and so don't count.) If the work I'm doing is super small, then the overheads of retrieving from the lazy var matters, but in this case, the time spent in the closure is so completely dominated by the main work that I don't particularly care.

I liked your idea @shabalind , but had one improvement: if we really care about this kind of performance, I'd recommend we do something like the following:

benchmark("my super fast operation") { context in
  // set up goes here
  for _ in context {
    mySuperFastOperation()
  }
  // cleanup here.
}

That way, we can get rid of the closure dispatch overhead and more. We just might need to make sure the Swift compiler doesn't get too smart and optimize away the loop body...

shabalind commented 4 years ago

@saeta I think both approaches ca co-exist:

state.start(), state.end() are useful for expressing set up and tear down, but the benchmark loop is still run outside of the benchmark closure. This is useful for longer-running benchmarks that want to skip part of the per-iteration time from being recorded.
for _ in state { ... } is great for moving the benchmark loop into the closure. This is great for short-running benchmarks that want to make sure that measurements don't get affected by the closure and/or protocol virtual dispatch overhead.

shabalind commented 4 years ago

Additionally, @mattt also suggests another alternative in #33:

Extend benchmark protocol, to include setUp and tearDown functions. It seems to be complementary to state.start(), state.end(), because it can't be used in closure-based benchmarks.

mattt commented 4 years ago

Extend benchmark protocol, to include setUp and tearDown functions. It seems to be complementary to state.start(), state.end(), because it can't be used in closure-based benchmarks.

That's only true in its current implementation. You could add setUp and tearDown closure arguments to the benchmark function (defaulting to nil so that they don't have to be specified):

benchmark("...", setUp: { /* ... */ }, tearDown: { /* ... */ }) { 
   /* ... */
}

Or, if you wanted to target the forthcoming multiple trailing closures proposal in SE-0279, you'd implement a separate overload of benchmark that requires all three closure arguments in a different order:

benchmark("...") {
  /* ... */
} setUp: {
  /* ... */
} tearDown: {
  /* ... */
}

I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start() without end() or not including that for _ in context.

shabalind commented 4 years ago

benchmark("...", setUp: { /* ... */ }, tearDown: { /* ... */ }) { ... }

I find it to be less elegant than either start/end or for _ in state { ... }, because it works poorly with respect to lexical scoping. For example:

benchmark(...) { state in
   let resource = Resource()
   resource.expensiveOperation()
   state.start()
   operationToBenchmark(resource)
}

Here due to lexical scoping everything is defined inside of the benchmark.

But if we had multiple closures:

let resource: Resource?
benchmark("...") {
   operationToBenchmark(resource!)
} setUp: {
   resource = Resource()
   resource!.expensiveOperation()
}

Here resource leaks to the outer scope for no good reason.

I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start() without end()

The semantics there would be that start/end are called automatically for you by the framework outside of the benchmark code, so omitting one or both of those is still valid and works as expected.

shabalind commented 4 years ago

On the other hand, the equivalent with a full-fledged struct instance and setUp and tearDown methods is going to be well scoped as well:

struct MyBenchmark: AnyBenchmark {
  let resource: Resource?
  func setUp() {
    resource = Resource()
    resource!.expensiveOperation()
  }
  func run() {
    operationToBenchmark(resource!)
  }
}

benchmark("...", MyBenchmark())

mattt commented 4 years ago

On the other hand, the equivalent with a full-fledged struct instance and setUp and tearDown methods is going to be well scoped as well.

Standard caveats for setUp and tearDown in XCTest applies here. Instance variables would either be declared as implicitly optional (var Resource!), provided an initial value (var resource = Resource(), or declared outside the benchmark entirely:

let resource = Resource()
resource.expensiveOperation()

benchmark {
   nonMutatingOperation(resource)
}

A do expression can reduce concerns about lexical scope:

do {
  let resource = Resource()
  resource.expensiveOperation()

  benchmark {
     nonMutatingOperation(resource)
  }
}

do {
  let resource = Resource()
  resource.expensiveOperation()

  benchmark {
     anotherNonMutatingOperation(resource)
  }
}

The semantics there would be that start/end are called automatically for you by the framework outside of the benchmark code, so omitting one or both of those is still valid and works as expected.

What should I expect if I were to...

Call start multiple times?
Call end before start?
Call start / end multiple times in an unbalanced way?

I really think adding start / end defeats the purposes of blocks, and I hope we can avoid it here.

saeta commented 4 years ago

I'd strongly discourage the adoption of the other patterns proposed here, because they make it possible to misuse the API, such as by calling start() without end() or not including that for _ in context.

Hmm, I think this is not so big of an issue, in the sense that the library can define a reasonable thing to do and trap if user code violates the assumptions. The following are some straw-person proposals for what we could do (that I haven't fully thought through, so feedback / counterproposals welcome!):

Option A (start exactly once):

If a timer object is taken by the benchmark closure, .start() must be called exactly once. If it is not called, or if it is called more than once, it is a fatalError and execution halts with a helpful error message.
.stop() may be called once, or zero times. In the case of zero times, timing ends when the closure returns.

Option B (start & stop exactly once):

If a timer object is taken by the benchmark closure, .start() and .stop() must each be called exactly once. It is a fatal error to for the user's closure to interact with the timer in any other way.

Option C (start & stop, and for loop):

If a timer object is taken by the benchmark closure, exactly one of the following 2 situations must apply: a. The timer object has .makeIterator() called upon it, which starts timing. When the iterator completes, it stops timing. If makeIterator() is called, it must be fully consumed by the user's closure. b. The timer object has .start() called upon it. (This must occur exactly once.) .stop() can be optional, per proposal A, or it can be required, per proposal B.
In either C.a, or C.b, we can immediately trap with a (helpful) error message if the user's code does something unexpected.

Since this is benchmark code, the source code is always available and mutable, and so I don't think it's a priority to statically prevent invalid use. Instead, I'd rather we optimize for an easy-to-use API. The goals of this (IMO) should be: (1) accurate benchmarks, (2) ease of authoring & maintaining benchmarks. On that latter point, I think it's a question of taste / style, but I think striking an appropriate balance between static guarantees and dynamic guarantees is better than all-static or all-dynamic.

Does that make sense? WDYT?

Note: one short-coming of the above proposals is that it disallows timing 2 performance critical operations with a non-performance critical operations in the middle. I think this is probably okay, but can be persuaded otherwise if we come up with some good use cases.

mattt commented 4 years ago

@saeta Without responding to any of your points individually, my general response is that both start()/stop() and for _ in context introduce complexity without any discernible benefit.

I think this is not so big of an issue, in the sense that the library can define a reasonable thing to do and trap if user code violates the assumptions.

Good design is about not being able to use something incorrectly. Option<T> is better than trapping on null pointer errors. Blocks are better than calling start and end.

Now that #33 is merged, is there any functionality that's not currently addressed?

shabalind commented 4 years ago

29 introduces an approach similar to `for _ in state`, only closure-based: `state.measure { ... }`. Apart from giving as easy way to express set up and tear down while using closures, we can also get better precision because `state.measure`'s closure is non-escaping (and thus possible to optimize away completely).

google / swift-benchmark

run setup code before the benchmark #29