dan-zheng commented 4 years ago

Compilation time (swift build) is very slow:

$ time swift build # clean build
[61/61] Linking libTensorFlow.dylib
swift build  506.63s user 5.63s system 253% cpu 3:22.07 total

$ echo "// Test." >>  Sources/TensorFlow/Layer.swift # trivial change
$ time swift build # incremental build
[3/3] Linking libTensorFlow.dylib
swift build  80.78s user 0.42s system 99% cpu 1:21.25 total

I'm not sure when exactly it got so bad. Let's try to improve this!

Action items

[ ] Identify compilation hot spots via profiling: pprof or Xcode Instruments.

This document describes Swift compiler performance tips.

Identifying hot spots using profiling tools like pprof or Xcode Instruments seems like a great first step. @marcrasi previously used pprof to generate TensorFlow module compilation flamegraphs: perhaps that work can be polished and open-sourced in this repository.

Type-checking is one big source of source of slowdown. Here are some sorted results from swift build -Xswiftc -Xfrontend -Xswiftc -debug-time-function-bodies (from Gist):

# Worst offenders, time in milliseconds.
(1767, "global function 'gelu'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1225:13')
(1767, "global function 'gelu'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1225:13')
(1767, "global function 'gelu'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1225:13')
(1866, "instance method 'sha512()'", '/Users/danielzheng/swift-apis/Sources/Tensor/TensorUtilities.swift:149:10')
(2220, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:152:17')
(2220, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:152:17')
(2220, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:152:17')
(2304, "global function 'root'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1375:13')
(2304, "global function 'root'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1375:13')
(2304, "global function 'root'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1375:13')
(2528, "global function 'hingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:109:13')
(2528, "global function 'hingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:109:13')
(2528, "global function 'hingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:109:13')
(3796, "global function 'categoricalHingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:139:13')
(3796, "global function 'categoricalHingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:139:13')
(3796, "global function 'categoricalHingeLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:139:13')
(4102, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:429:17')
(4102, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:429:17')
(4102, "instance method 'update(_:along:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Optimizers/MomentumBased.swift:429:17')
(4128, "global function 'cosineSimilarity'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1492:13')
(4128, "global function 'cosineSimilarity'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1492:13')
(4128, "global function 'cosineSimilarity'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Operators/Math.swift:1492:13')
(15152, "global function 'logCoshLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:157:13')
(15152, "global function 'logCoshLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:157:13')
(15152, "global function 'logCoshLoss(predicted:expected:reduction:)'", '/Users/danielzheng/swift-apis/Sources/TensorFlow/Loss.swift:157:13')

Idea from @allevato: provide a contextual type for literal expressions to help the type checker.

It works:

import TensorFlow

@differentiable
public func logCoshLoss<Scalar: TensorFlowFloatingPoint>(
    predicted: Tensor<Scalar>,
    expected: Tensor<Scalar>,
    reduction: @differentiable (Tensor<Scalar>) -> Tensor<Scalar> = _mean
) -> Tensor<Scalar> {
    let x = predicted - expected
    // Original code.
    return reduction(x + softplus(Tensor(-2) * x) - log(Tensor(2)))
}

@differentiable
public func logCoshLossTest<Scalar: TensorFlowFloatingPoint>(
    predicted: Tensor<Scalar>,
    expected: Tensor<Scalar>,
    reduction: @differentiable (Tensor<Scalar>) -> Tensor<Scalar> = _mean
) -> Tensor<Scalar> {
    let x = predicted - expected
    // Tony's suggestion: provide contextual type for literals.
    return reduction(x + softplus(Tensor(-2 as Scalar) * x) - log(Tensor(2 as Scalar)))
}

$ swift -Xfrontend -debug-time-function-bodies timing.swift
timing.swift:4:13: warning: global function 'logCoshLoss(predicted:expected:reduction:)' took 7400ms to type-check (limit: 1ms)
public func logCoshLoss<Scalar: TensorFlowFloatingPoint>(
            ^
timing.swift:15:13: warning: global function 'logCoshLossTest(predicted:expected:reduction:)' took 134ms to type-check (limit: 1ms)
public func logCoshLossTest<Scalar: TensorFlowFloatingPoint>(
            ^

I believe this should not be necessary, and may be a deficiency in the Swift type checker, specifically bidirectional type-checking and constraint solving. Whenever a contextual type exists, constraints should propagate from out to in, so Scalar is the only possible type for the literals 2 and -2 in logCoshLoss. If we start constraint solving from the type variables for 2 and -2 (which have many possible types), a huge disjunction constraint may be generated, leading to big slowdown.

It would be nice to write a Swift forums question with a minimal reproducer of similar bad type-checking performance for literals with contextual type.

Suggestion from @rxwei: splitting larger files into more smaller files can help multithreaded compilation (swift build), since one thread can be spawned per file. Some files like Sources/TensorFlow/Operators/Math.swift (currently 2835 lines) are huge and can be split.

Note that we have one huge generated Swift file for TensorFlow bindings: Sources/TensorFlow/Bindings/RawOpsGenerated.swift (currently 36743 lines). That probably takes a while to compile.

Shashi456 commented 4 years ago

@dan-zheng Math.swift can definitely be broken down, but if that is the long term solution, isn't there a chance it might be lead to over-granular files?

dan-zheng commented 4 years ago

@dan-zheng Math.swift can definitely be broken down, but if that is the long term solution, isn't there a chance it might be lead to over-granular files?

I think we can find a reasonable granularity, based on common sense. One file per function is obviously too fine-grained. One file per category of functions (e.g. ReductionOperations.swift) seems reasonable. If the goal is to improve compilation time, changes should be backed by benchmark results.

If anyone pursues splitting large files into smaller ones to improve compilation time, please include some benchmark results, like in the PR description.

tensorflow / swift-apis

Improve compilation time #618

Action items