While working on an observable collections framework, I noticed there are surprising performance differences between the various ways Swift allows us to represent deferred method invocations.
In particular, creating and calling a partially applied method invocation is much slower than creating and calling a closure that calls the same method. For example, in the code below, loop (2) is about 70% faster than loop (1):
class Observer {
@inline(never) func receive(_ value: Int) {}
}
class Signal {
var observers: [(Int) -> ()] = []
func subscribe(_ observer: @escaping (Int) -> ()) {
observers.append(observer)
}
func send(_ value: Int) {
for observer in observers {
observer(value)
}
}
}
let observer = Observer()
for i in 0 ..< 100_000 { // (1)
signal.subscribe(observer.receive)
}
for i in 0 ..< 100_000 { // (2)
signal.subscribe({ i in observer.receive(i) })
}
The partially applied method forwarder is a less abstract construct than a closure, so I expected the former would be measurably faster. (Or at least they would be equally slow.)
However, it looks like the method forwarder may be doing an extra allocation compared to the closure (which is already doing one, for its capture context). This seems unnecessary; for partially applied methods, the reference to self could be stored directly in place of the context reference.
In fact, the same optimization could be performed for closures with small constant captures. This is demonstrated by protocol existentials which are able to store small values without boxing them:
class Observer {
@inline(never) func receive(_ value: Int) {}
}
protocol Sink {
func receive(_ value: Int)
}
struct Forwarder: Sink {
let object: Observer
func receive(_ value: Int) {
object.receive(value)
}
}
class Signal {
var observers: [Sink] = []
func subscribe(_ sink: Sink) {
observers.append(sink)
}
func send(_ value: Int) {
for observer in observers {
observer.receive(value)
}
}
}
let signal = Signal()
let observer = Observer()
for _ in 0 ..< 10_000 * iterations {
signal.subscribe(Forwarder(object: observer))
}
The code above is functionally equivalent to the previous, but it does no per-subscription allocation, so it's twice as fast as even the version using closures. The protocol existential has enough space to store the entire Sink struct inline, with no boxing.
Closures in escaping contexts could use a similar optimization to store small constant capture contexts directly inside the function value. Currently there is only enough space for single-word contents there, but making use of even that would be helpful. If the size of function types could be expanded by a couple words, escaping closures could become stack-allocated for even more use-cases.
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Improvement, Performance | |Assignee | None | |Priority | Medium | md5: 9d1e7ffd8fd83f57c84cbbcb3653a8c8Issue Description:
While working on an observable collections framework, I noticed there are surprising performance differences between the various ways Swift allows us to represent deferred method invocations.
In particular, creating and calling a partially applied method invocation is much slower than creating and calling a closure that calls the same method. For example, in the code below, loop (2) is about 70% faster than loop (1):
The partially applied method forwarder is a less abstract construct than a closure, so I expected the former would be measurably faster. (Or at least they would be equally slow.)
However, it looks like the method forwarder may be doing an extra allocation compared to the closure (which is already doing one, for its capture context). This seems unnecessary; for partially applied methods, the reference to
self
could be stored directly in place of the context reference.In fact, the same optimization could be performed for closures with small constant captures. This is demonstrated by protocol existentials which are able to store small values without boxing them:
The code above is functionally equivalent to the previous, but it does no per-subscription allocation, so it's twice as fast as even the version using closures. The protocol existential has enough space to store the entire
Sink
struct inline, with no boxing.Closures in escaping contexts could use a similar optimization to store small constant capture contexts directly inside the function value. Currently there is only enough space for single-word contents there, but making use of even that would be helpful. If the size of function types could be expanded by a couple words, escaping closures could become stack-allocated for even more use-cases.