tensorflow / swift-apis

Swift for TensorFlow Deep Learning Library
Apache License 2.0
794 stars 133 forks source link

Expose all devices. #1059

Open texasmichelle opened 4 years ago

texasmichelle commented 4 years ago

On a machine with GPU or TPU, I get a segfault if I try to use Device with CPU type on XLA backend, e.g.:

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let t1 = Tensor([1, 1, 0], on: device)
let t2 = Tensor([1, 1, 0], on: device)
t1 + t2
2020-08-10 15:43:18.077050: E tensorflow/compiler/xla/xla_client/tf_logging.cc:23] Check failed: it != device_contexts_.end() 
*** Begin stack trace ***

    copyTensor

    $sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
    $s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
    $s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC

*** End stack trace ***
No such device: CPU:0
2020-08-10 15:43:18.077121: F tensorflow/compiler/xla/xla_client/tf_logging.cc:26] tensorflow/compiler/tf2xla/xla_tensor/tensor.cpp:419 : Check failed: it != device_contexts_.end() 
*** Begin stack trace ***

    copyTensor

    $sSa23withUnsafeBufferPointeryqd__qd__SRyxGKXEKlF
    $s10TensorFlow9XLATensorV4make__2onACSRyxG_SaySiGAA6DeviceVtAA13XLAScalarTypeRzlFZ
    $s10TensorFlow0A0V5shape7scalars2onACyxGAA0A5ShapeV_SRyxGAA6DeviceVtcfC

*** End stack trace ***
No such device: CPU:0
Current stack trace:
    frame #21: 0x00007fb3999eb113 $__lldb_expr218`main at <Cell 28>:2

A workaround is to set the XRT_DEVICE_MAP environment variable, but all device and backend combinations should be accessible without this.

See swift-models/#654.

BradLarson commented 4 years ago

As examples of how these mappings are defined at the command line, here's how you would expose both the CPU and GPU as selectable devices (assuming a single CPU and GPU):

export XRT_DEVICE_MAP='CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0|GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0'

and here's how you would expose two GPUs (not exposing the CPU):

export XRT_DEVICE_MAP='GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:0/device:XLA_GPU:1'

Currently, only one default device is found and exposed. If you want something other than the default, you need to manually specify the XLA -> S4TF mapping for all devices you want. The devices are parsed from the XRT_DEVICE_MAP environment variable within ParseEnvDevices here. That may be the place to add CPU support on GPU-default systems, because we can safely assume the CPU is present there.