Open garymm opened 4 years ago
I believe this is due to a bug in the way that eager tensors report their device location. The eager tensors have their operations dispatched on the default accelerator, but always report themselves as being located on the CPU. If you run operations using them on your local machine, you can verify that they're running on the GPU by monitoring GPU activity via nvidia-smi
or similar tools.
Likewise, eager tensors currently ignore the device you specify for them, so if you tell them to run on the CPU when there's a GPU available, they'll still run on the GPU.
X10 tensors are accurate in reporting which device they're attached to, as well as respecting manual device placement, just not eager tensors.
It looks like this line is always returning the CPU device. I'll figure out how to surface the actual device being used.
Once swift-apis#1156 is merged, TFE_TensorHandleDeviceType
and TFE_TensorHandleDeviceID
will be available, making this a straightforward fix.
Tentative changes here.
I ran into a problem adding eager/c_api_experimental.h since it contains C++ syntax in the initialization of the TFE_CustomDevice
struct.
/home/michellecasbon/repos/out/libtensorflow-prefix/src/libtensorflow/tensorflow/c/eager/c_api_experimental.h:446:14: error: expected ';' at end of declaration list
int version = TFE_CUSTOM_DEVICE_VERSION;
^
It's unclear how to get around this without pursuing custom import rules or modifying upstream.
I'm playing with https://www.tensorflow.org/swift/tutorials/introducing_x10. Both locally and on Colab, the eager tensor shows up on the CPU. The text says
If you are running this notebook on a GPU-enabled instance, you should see that hardware reflected in the device description above.
Even if I try to force it to the GPU, it seems to stay on the CPU:
Output:
So I'd say there may be 2 bugs here: