I saw this with autodiff-produced code, but I believe the issue is generic and is not autodiff-related, it was just easier to notice there.
Consider the reproduction code. hardyCrossSimpleLoop function contains a loop that calculates gradient in the loop. If we'd comment out let (f, Df) = valueWithGradient(at: flow, of: subsystem.totalDeltaP) line and instead uncomment the following line that does essentially the same, then the code is optimized down to just simple sequence of arithmetic operations. However, the original code cannot be optimized down and actually the runtime differs by something like 3 orders of magnitude.
So, essentially we are taking a closure over subsystem value and then executing the autodiff code. Note that all values involved are explicitly destroyed in the end of the loop body. Differentiation does not change much, essentially the first apply is just changed to autodiff curry thunk.
What happes is ClosureLifetimeFixup essentially creates bunch of optionals extending the lifetime of all closures until the end of the next loop iteration:
Looks like these optionals inhibit all kinds of optimizations including inlining, specialization, as the closures and differentiable functions now have multiple uses and therefore necessary peepholes (that expects single use obviously) cannot happen. So, after all optimizations we end with:
(note that %53 has also %99 use that essentially saves the function into optional until the next iteration).
All these does not happen when we control the lifetime just outlining the things into a separate function.
Is there something that could be done differently at autodiff side to "help" closure fixup if it cannot be fixed to handle this particular case in better way?
Reproduction
import _Differentiation
// MARK: Boiler
private typealias Scalar = Float
private struct Boiler {
@differentiable(reverse)
func inlineResistanceFluid(_ x: Scalar) -> Scalar {
100000.0
}
@differentiable(reverse)
func deltaP(_ x: Scalar) -> Scalar {
-inlineResistanceFluid(x) * x * x
}
}
// MARK: Pump
private struct Pump {
private let coefConst: Scalar = 54162.06
private let coefLin: Scalar = -51815.13
private let coefQuad: Scalar = 10052.31
@differentiable(reverse)
//@_silgen_name("Pump_deltaP")
func deltaP(_ x: Scalar) -> Scalar {
coefConst + coefLin * x + coefQuad * x * x
}
}
// MARK: RadiantSlab
private struct RadiantSlab {
init() { }
//@_silgen_name("RadiantSlab_inlineResistanceFluid")
@differentiable(reverse)
func inlineResistanceFluid(_ x: Scalar) -> Scalar {
let dynamicViscosity: Scalar = 8.9e-4
let nLoop = Scalar(3)
let lengthTotal: Scalar = 163.98
let innerDia: Scalar = 0.01905
let lengthLoop = lengthTotal / nLoop
let areaFlow: Scalar = Scalar.pi * 0.25 * innerDia * innerDia
let reNr: Scalar = x * innerDia / (dynamicViscosity * areaFlow) // f(x)
let frictionFactor: Scalar = 64.0 / (reNr + 0.01) //h(g(f(x)))
let flowCoeff: Scalar = frictionFactor * lengthLoop / innerDia //i(h(g(f(x))))
return flowCoeff / (2.0 * 995.0 * areaFlow * areaFlow) //j(i(h(g(f(x)))))
// j(i(h(g(f(x)))))' = j'(i(h(g(f(x)))))i'(h(g(f(x)))h'(g(f(x)))g'(f(x))f'(x)
}
//@_silgen_name("RadiantSlab_deltaP")
@differentiable(reverse)
func deltaP(_ x: Scalar) -> Scalar {
-inlineResistanceFluid(x) * x * x
}
}
// MARK: SubSystem
private struct Subsystem { // Simple loop like Basic Load Matching
private let boiler = Boiler()
private let pump = Pump()
private let load = RadiantSlab()
init() {
}
//@_silgen_name("Subsystem_totalDeltaP")
@differentiable(reverse)
func totalDeltaP(_ flow: Scalar) -> Scalar {
boiler.deltaP(flow) + pump.deltaP(flow) + load.deltaP(flow)
}
}
@inline(never)
private func hardyCrossSimpleLoop(initialFlow: Scalar, subsystem: Subsystem) -> Scalar {
let maxIters = 20
let balanceTol: Scalar = 1e-6
let stepTol: Scalar = 1e-6
// let delta: Scalar = 1e-4
var flow = initialFlow
@inline(never)
func valueWithGradientWrapper(at: Scalar) -> (Scalar, Scalar) {
return valueWithGradient(at: at, of: subsystem.totalDeltaP)
}
for _ in 0 ..< maxIters {
// Auto diff
let (f, Df) = valueWithGradient(at: flow, of: subsystem.totalDeltaP)
//let (f, Df) = valueWithGradientWrapper(at: flow)
let step = -f / Df
flow += step
guard abs(step) > stepTol, abs(f) > balanceTol else { break }
}
return flow
}
private let subsystem = Subsystem()
private let nominalFlow: Scalar = 0.25 // Pre-solve guess at flow
private var flow: Scalar = 0
flow += hardyCrossSimpleLoop(initialFlow: nominalFlow, subsystem: subsystem)
print(flow)
Expected behavior
Code could be optimized down to simple set of arithmetic operations in both cases.
Environment
Swift version 6.1-dev (LLVM fcc20a24e57c484, Swift f802b67fc06447f)
Target: arm64-apple-macosx13.0
Description
I saw this with autodiff-produced code, but I believe the issue is generic and is not autodiff-related, it was just easier to notice there.
Consider the reproduction code.
hardyCrossSimpleLoop
function contains a loop that calculates gradient in the loop. If we'd comment outlet (f, Df) = valueWithGradient(at: flow, of: subsystem.totalDeltaP)
line and instead uncomment the following line that does essentially the same, then the code is optimized down to just simple sequence of arithmetic operations. However, the original code cannot be optimized down and actually the runtime differs by something like 3 orders of magnitude.The loop body looks as follows:
So, essentially we are taking a closure over
subsystem
value and then executing the autodiff code. Note that all values involved are explicitly destroyed in the end of the loop body. Differentiation does not change much, essentially the first apply is just changed to autodiff curry thunk.What happes is
ClosureLifetimeFixup
essentially creates bunch of optionals extending the lifetime of all closures until the end of the next loop iteration:Looks like these optionals inhibit all kinds of optimizations including inlining, specialization, as the closures and differentiable functions now have multiple uses and therefore necessary peepholes (that expects single use obviously) cannot happen. So, after all optimizations we end with:
(note that
%53
has also%99
use that essentially saves the function into optional until the next iteration).All these does not happen when we control the lifetime just outlining the things into a separate function.
Is there something that could be done differently at autodiff side to "help" closure fixup if it cannot be fixed to handle this particular case in better way?
Reproduction
Expected behavior
Code could be optimized down to simple set of arithmetic operations in both cases.
Environment
Swift version 6.1-dev (LLVM fcc20a24e57c484, Swift f802b67fc06447f) Target: arm64-apple-macosx13.0
Additional information
No response