tensorflow / swift

Swift for TensorFlow
https://tensorflow.org/swift
Apache License 2.0
6.12k stars 607 forks source link

Eager tensors always report being on CPU Device despite documentation #524

Open garymm opened 4 years ago

garymm commented 4 years ago

I'm playing with https://www.tensorflow.org/swift/tutorials/introducing_x10. Both locally and on Colab, the eager tensor shows up on the CPU. The text says If you are running this notebook on a GPU-enabled instance, you should see that hardware reflected in the device description above.

Even if I try to force it to the GPU, it seems to stay on the CPU:

let eagerGPU = Device(kind: .GPU, ordinal: 0, backend: .TF_EAGER)
let eagerTensor1 = Tensor([0.0, 1.0, 2.0], on: eagerGPU)
let eagerTensor2 = Tensor([1.5, 2.5, 3.5], on: eagerGPU)
let eagerTensorSum = eagerTensor1 + eagerTensor2
eagerTensor1.device

Output:

▿ Device(kind: .CPU, ordinal: 0, backend: .TF_EAGER)
  - kind : TensorFlow.Device.Kind.CPU
  - ordinal : 0
  - backend : TensorFlow.Device.Backend.TF_EAGER

So I'd say there may be 2 bugs here:

  1. Either the documentation is wrong and eager tensors are only supposed to be able to use the CPU, or the documentation is right and code is buggy and doesn't use the GPU, and
  2. If the documentation is wrong, creating a tensor with an eager GPU should fail rather than silently run on the CPU.
BradLarson commented 4 years ago

I believe this is due to a bug in the way that eager tensors report their device location. The eager tensors have their operations dispatched on the default accelerator, but always report themselves as being located on the CPU. If you run operations using them on your local machine, you can verify that they're running on the GPU by monitoring GPU activity via nvidia-smi or similar tools.

Likewise, eager tensors currently ignore the device you specify for them, so if you tell them to run on the CPU when there's a GPU available, they'll still run on the GPU.

X10 tensors are accurate in reporting which device they're attached to, as well as respecting manual device placement, just not eager tensors.

texasmichelle commented 4 years ago

It looks like this line is always returning the CPU device. I'll figure out how to surface the actual device being used.

texasmichelle commented 3 years ago

Once swift-apis#1156 is merged, TFE_TensorHandleDeviceType and TFE_TensorHandleDeviceID will be available, making this a straightforward fix.

texasmichelle commented 3 years ago

Tentative changes here.

texasmichelle commented 3 years ago

I ran into a problem adding eager/c_api_experimental.h since it contains C++ syntax in the initialization of the TFE_CustomDevice struct.

/home/michellecasbon/repos/out/libtensorflow-prefix/src/libtensorflow/tensorflow/c/eager/c_api_experimental.h:446:14: error: expected ';' at end of declaration list
  int version = TFE_CUSTOM_DEVICE_VERSION;
             ^

It's unclear how to get around this without pursuing custom import rules or modifying upstream.