tensorflow / swift

Swift for TensorFlow
https://tensorflow.org/swift
Apache License 2.0
6.12k stars 608 forks source link

Ubuntu Shared Libraries Not Found at Runtime #504

Closed xanderdunn closed 4 years ago

xanderdunn commented 4 years ago

Ubuntu 18.04.4 Swift Toolchain release 0.10 for Ubuntu 18.04 CUDA 10.2 cudnn 7

How I installed and use Swift for Tensorflow toolchain:

cd
mkdir swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04
tar -xzf swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04.tar.gz --directory ~/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/
export PYTHON_LIBRARY=/home/xander/anaconda3/lib/libpython3.so
export PATH=/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/bin:$PATH
export LD_LIBRARY_PATH=/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/lib/swift/linux:$LD_LIBRARY_PATH
ldconfig

You can see I installed it into a folder in my home directory rather than to root directory.

$ which swift
/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/bin/swift
$ swift --version
Swift version 5.3-dev (LLVM 55d27a5828, Swift 6a5d84ec08)
Target: x86_64-unknown-linux-gnu

I am attempting to run the Model training walkthrough. I have it successfully running on my Mac, but I am failing to get the shared libraries to link at runtime on my Ubuntu instance.

swift build and swiftc Sources/MySwiftProject/*.swift -o main.exe both successfully build:

$ swift build
Fetching https://github.com/tensorflow/swift.git
Fetching https://github.com/pvieito/PythonKit.git
Fetching https://github.com/pvieito/LoggerKit.git
Fetching https://github.com/onevcat/Rainbow
Fetching https://github.com/apple/swift-argument-parser
Cloning https://github.com/apple/swift-argument-parser
Resolving https://github.com/apple/swift-argument-parser at 0.0.6
Cloning https://github.com/onevcat/Rainbow
Resolving https://github.com/onevcat/Rainbow at 3.1.5
Cloning https://github.com/tensorflow/swift.git
Resolving https://github.com/tensorflow/swift.git at master
Cloning https://github.com/pvieito/LoggerKit.git
Resolving https://github.com/pvieito/LoggerKit.git at master
Cloning https://github.com/pvieito/PythonKit.git
Resolving https://github.com/pvieito/PythonKit.git at master
warning: dependency 'Metaprogramming' is not used by any target
/home/xander/dev/swift-learning/Sources/MySwiftProject/TutorialDatasetCSVAPI.swift:43:11: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
extension Dataset where Element == IrisBatch {
          ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/TutorialDatasetCSVAPI.swift:73:14: warning: 'init(elements:)' is deprecated
        self.init(elements: IrisBatch(features: featuresTensor, labels: labelsTensor))
             ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:52:19: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
let trainDataset: Dataset<IrisBatch> = Dataset(
                  ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:52:40: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
let trainDataset: Dataset<IrisBatch> = Dataset(
                                       ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/TutorialDatasetCSVAPI.swift:43:11: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
extension Dataset where Element == IrisBatch {
          ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:55:3: warning: 'batched' is deprecated
).batched(batchSize)
  ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:167:18: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
let testDataset: Dataset<IrisBatch> = Dataset(
                 ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:167:39: warning: 'Dataset' is deprecated: Datasets will be removed in S4TF v0.10. Please use the new Batches API instead.
let testDataset: Dataset<IrisBatch> = Dataset(
                                      ^
/home/xander/dev/swift-learning/Sources/MySwiftProject/main.swift:170:3: warning: 'batched' is deprecated
).batched(batchSize)
  ^
[11/11] Linking MySwiftProject

But running ./main.exe or swift run produce a runtime error that FoundationNetworking cannot be found:

./main.exe
Fatal error: You must link or load module FoundationNetworking to load non-file: URL content using String(contentsOf:…), Data(contentsOf:…), etc.: file /swift-base/swift-corelibs-foundation/Sources/Foundation/NSSwiftRuntime.swift, line 401
Current stack trace:
0    libswiftCore.so                    0x00007f6d69017140 swift_reportError + 50
1    libswiftCore.so                    0x00007f6d6908cab0 _swift_stdlib_reportFatalErrorInFile + 115
2    libswiftCore.so                    0x00007f6d68d284a2 <unavailable> + 1533090
3    libswiftCore.so                    0x00007f6d68d280e6 <unavailable> + 1532134
4    libswiftCore.so                    0x00007f6d68d28685 <unavailable> + 1533573
5    libswiftCore.so                    0x00007f6d68d26b00 _assertionFailure(_:_:file:line:flags:) + 528
6    libFoundation.so                   0x00007f6d671decf4 <unavailable> + 6954228
7    libFoundation.so                   0x00007f6d670634d7 <unavailable> + 5399767
8    libFoundation.so                   0x00007f6d67062db0 NSData.init(contentsOf:options:) + 970
9    libFoundation.so                   0x00007f6d67062d20 NSData.__allocating_init(contentsOf:options:) + 63
10   libFoundation.so                   0x00007f6d66e03160 Data.init(contentsOf:options:) + 122
11   main.exe                           0x0000561f83d2427f <unavailable> + 78463
12   main.exe                           0x0000561f83d1c9e9 <unavailable> + 47593
13   libc.so.6                          0x00007f6d1f33aab0 __libc_start_main + 231
14   main.exe                           0x0000561f83d1a53a <unavailable> + 38202
Illegal instruction (core dumped)

FoundationNetworking is being referenced because of the download function to get the Iris data.

I do see the library is in the correct location. /home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/lib/swift/linux/libFoundationNetworking.so is there. You can see in the above setup commands that I included this directory in LD_LIBRARY_PATH.

As done in apple/swift-docker#75, I tried creating a file /etc/ld.so.conf.d/swift.conf with this in it:

/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/lib/swift/linux
/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/lib/swift/clang/lib/linux
/home/xander/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr/lib/swift/pm

Then I executed sudo ldconfig again. Same runtime error.

On my Mac where it's working I also installed the toolchain to my home directory and solved the shared library linking by setting the correct path with DYLD_LIBRARY_PATH. This same strategy is not working on Ubuntu.

I'm sure this is something very minor, but I haven't been able to find it. Any ideas why Swift can't find Foundation Networking at runtime?

BradLarson commented 4 years ago

Do you hit the same error if you clone swift-models and run

swift run -c release LeNet-MNIST

within it? That should call into a code path that also uses FoundationNetworking, and works well on my Ubuntu systems.

xanderdunn commented 4 years ago

Thanks @BradLarson, that does indeed work.

It appears all I was missing was an import FoundationNetworking in my main.swift. With that, it runs fine on Ubuntu. I'm new to Swift, but I'm surprised that wasn't caught at compile time.

Now I have to wonder why it works on Mac without the import FoundationNetworking. Apparently there is a difference in libraries?

$ swift --version
Swift version 5.3-dev (LLVM 55d27a5828, Swift 6a5d84ec08)
Target: x86_64-apple-darwin19.5.0
$ ls -lah /Library/Developer/Toolchains/swift-tensorflow-RELEASE-0.10.xctoolchain/usr/lib/swift/macosx/
Screen Shot 2020-07-17 at 19 53 30

FoundationNetworking.dylib does not exist in the macOS toolchain, but it does in the Ubuntu toolchain:

$ swift --version
Swift version 5.3-dev (LLVM 55d27a5828, Swift 6a5d84ec08)
Target: x86_64-unknown-linux-gnu
$ ls -lah ~/swift-tensorflow-RELEASE-0.10-cuda10.2-cudnn7-ubuntu18.04/usr
/lib/swift/linux/
Screen Shot 2020-07-17 at 19 53 09

Why is FoundationNetworking a library in one OS toolchain but not the other?

However, the LeNet-MNIST swift-models run works on my Mac as well, so I am missing some kind of inter-OS operability.

It looks like this is how it's handled:

#if canImport(FoundationNetworking)
    import FoundationNetworking
#endif
BradLarson commented 4 years ago

Foundation on macOS contains the networking functionality that is split out into FoundationNetworking on Linux and elsewhere. That's why you can get away with not having it there (and why you need to check for FoundationNetworking's availability to account for macOS, because it doesn't exist there). I think this was the pitch that led to FoundationNetworking getting split off.

I agree, the difference between macOS and everywhere else can be confusing. #if canImport(FoundationNetworking is the recommended way to handle this. As to why that wasn't caught at compile time, URLs can both represent local files and remote ones, and Foundation can handle the former without the networking help. Therefore, there weren't specific types that were missing without the module import (something that's caught at compile time), but behind-the-scenes capabilities that only made themselves clear at runtime. This is a bit of a special case, and sorry it was so confusing.

xanderdunn commented 4 years ago

Thanks @BradLarson, great explanation