google / swift

The Swift Programming Language
https://swift.org/
841 stars 66 forks source link

Importing third-party packages before TensorFlow causes a runtime error #4

Open zachgrayio opened 6 years ago

zachgrayio commented 6 years ago

Continuing our discussion from the group here.

Full background - I've just copied my comment directly from the group:

I've had some success in using third-party SPM packages by creating a dynamic library and linking to it when launching the REPL, however, it seems like the import order of TensorFlow vs other packages is important; importing the 3rd-party lib first causes a C++ runtime error in TensorFlow.

Here's some snippets:

Package.swift

import PackageDescription

let package = Package(
    name: "TFExample",
    products: [
        .library(
            name: "TFExample",
            type: .dynamic,    // allow use of this package and it's deps from the REPL
            targets: ["TFExample"]
        )
    ],
    dependencies: [
        .package(url: "https://github.com/ReactiveX/RxSwift.git", "4.0.0" ..< "5.0.0")
    ],
    targets: [
        .target(
            name: "TFExample",
            dependencies: ["RxSwift"]),
        .testTarget(
            name: "TFExampleTests",
            dependencies: ["TFExample"]),
    ]
)

... then we just fetch dependencies and build with vanilla commands, then invoke the REPL:

Invocation

swift -I/usr/lib/swift/clang/include -I/usr/src/TFExample/.build/debug -L/usr/src/TFExample/.build/debug -lTFExample

At this point, I'm able to import RxSwift and TensorFlow in the REPL without errors in any order; however, when I actually interact with the packages, the incorrect import order does result in a runtime error:

Scenario 1 (OK)

  1> import TensorFlow
  2> import RxSwift
  3>  _ = Observable.from([1,2]).subscribe(onNext: { print($0) })
1
2
  4> var x = Tensor([[1, 2], [3, 4]])
2018-04-27 17:13:12.514107: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
x: TensorFlow.Tensor<Double> = [[1.0, 2.0], [3.0, 4.0]]

Scenario 2 (runtime error)

  1> import RxSwift
  2> import TensorFlow
  3> _ = Observable.from([1,2]).subscribe(onNext: { print($0) })
1
2
  4> var x = Tensor([[1, 2], [3, 4]])
x: TensorFlow.Tensor<Double> =terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

The full process is outlined here if more detail is necessary: https://github.com/zachgrayio/swift-tensorflow/blob/example/package/README.md#run-with-dependencies-advanced

dan-zheng commented 6 years ago

Thanks for providing so much detail! I'm looking into this now.

dan-zheng commented 6 years ago

I was able to replicate the issue:

$ docker run --rm --privileged --cap-add sys_ptrace -it -v ${PWD}:/usr/src zachgray/swift-tensorflow:4.2 swift -I/usr/lib/swift/clang/include -I/usr/src/TFExample/.build/debug -L/usr/src/TFExample/.build/debug -lTFExample
Welcome to Swift version 4.2-dev (LLVM 04bdb56f3d, Clang b44dbbdf44). Type :help for assistance.
  1> import TensorFlow
  2> import RxSwift
  3> Tensor(1)
error: Couldn't lookup symbols:
  protocol witness table for Swift.Double : TensorFlow.AccelerableByTensorFlow in TensorFlow
  _swift_tfc_StartTensorComputation
  _swift_tfc_FinishTensorComputation
  direct field offset for TensorFlow.TensorHandle.cTensorHandle : Swift.OpaquePointer
  type metadata accessor for TensorFlow.TensorHandle

  3> var x = Tensor([[1, 2], [3, 4]])
x: TensorFlow.Tensor<Double> =terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

The solution is to add an extra -lswiftTensorFlow flag:

$ docker run --rm --privileged --cap-add sys_ptrace -it -v ${PWD}:/usr/src zachgray/swift-tensorflow:4.2 swift -I/usr/lib/swift/clang/include -I/usr/src/TFExample/.build/debug -L/usr/src/TFExample/.build/debug -lTFExample -lswiftTensorFlow
Welcome to Swift version 4.2-dev (LLVM 04bdb56f3d, Clang b44dbbdf44). Type :help for assistance.
  1> import RxSwift
  2> import TensorFlow
  3> _ = Observable.from([1,2]).subscribe(onNext: { print($0) })
1
2
  4> var x = Tensor([[1, 2], [3, 4]])
2018-04-27 23:07:35.467557: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
x: TensorFlow.Tensor<Double> = [[1.0, 2.0], [3.0, 4.0]]

I tested the Swift interpreter by putting the code into test.swift, then running: docker run --rm --privileged --cap-add sys_ptrace -it -v ${PWD}:/usr/src zachgray/swift-tensorflow:4.2 swift -I/usr/lib/swift/clang/include -I/usr/src/TFExample/.build/debug -L/usr/src/TFExample/.build/debug -lTFExample -O /usr/src/test.swift

This worked without specifying -lswiftTensorFlow, suggesting the problem is probably REPL-specific and involves linker flags.

On Linux, the Swift shared runtime library path is found at <path_to_toolchain>/usr/lib/swift/linux. It contains shared libraries like libswiftCore.so, libswiftTensorFlow.so, libswiftPython.so, etc.

In lib/Driver/Toolchains.cpp (used by the interpreter/compiler), toolchains::GenericUnix::constructInvocation automatically adds flags that add the Swift shared runtime library path and link libswiftCore.so. Ostensibly, there's other logic for handling other libraries in the same path (like libswiftPython.so) but I couldn't find it.

The REPL uses entirely separate logic for linking libraries (somewhere in google/swift-lldb). I'll do some digging and try to fix this.

This linking is probably related to #5.

zachgrayio commented 6 years ago

@dan-zheng - nice work man. This is exactly what I was missing. See the following:

docker run --rm --privileged --cap-add sys_ptrace -itv ${PWD}:/usr/src \
    zachgray/swift-tensorflow:4.2 \
    swift \
    -I/usr/lib/swift/clang/include \
    -I/usr/src/TFExample/.build/debug \
    -L/usr/src/TFExample/.build/debug \
    -lswiftPython \
    -lswiftTensorFlow \
    -lTFExample

Welcome to Swift version 4.2-dev (LLVM 04bdb56f3d, Clang b44dbbdf44). Type :help for assistance.
  1> import RxSwift
  2> import Python
  3> import TensorFlow
  4> var x = Tensor([[1, 2], [3, 4]])
2018-04-28 00:11:10.828554: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
x: TensorFlow.Tensor<Double> = [[1.0, 2.0], [3.0, 4.0]]
  5> _ = Observable.from([1,2]).subscribe(onNext: { print($0) })
1
2
  6> var x: PyValue = [1, "hello", 3.14]
x: Python.PyValue = [1, 'hello', 3.14]
  7> :exit

** edited formatting

dan-zheng commented 6 years ago

I'm working on a simple fix now. Regarding import order: I didn't notice errors when importing Python before TensorFlow so that's the order I'll use.

dan-zheng commented 6 years ago

I believe this is fixed in 1969380862d0db8ab090325e878e1ca2969ed2d6. You can try the pre-built packages from 05-10 to verify.

pschuh commented 5 years ago

This should be able to be solved by -module-link-name . That avoids this hack into the compiler.

pschuh commented 5 years ago

I've reproduced linking this way outside of the swift compiler. This is also the way that foundation and xctest works. That avoids the problem of linking these libs into every binary if they are needed or not.