rstudio / tensorflow

TensorFlow for R
https://tensorflow.rstudio.com
Apache License 2.0
1.32k stars 317 forks source link

Colocation Issue with ResourceStridedSliceAssign #592

Closed jonbry closed 3 months ago

jonbry commented 4 months ago

I am working through an example from Deep Learning with R and I get a colocation error when I try to assign a value to a subset of a TensorFlow variable (which I believe is of dtype = float32):

v[1, 1]$assign(3)

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
<...truncated...>6]
  device='CPU'; T in [DT_FLOAT]...

I'm using an M1 Mac and I saw on the TensorFlow website for R that I may be able to resolve the issue by running on the CPU using with(tf$device("CPU"), ...), but I am getting the same error.

Is there any way to fix or get around this issue? I am using TensorFlow 2.15, Python 3.10.4, and R 4.3.2 if that is helpful. Let me know if there is any additional information I can provide to help troubleshot the issue.

Thank you!

t-kalinowski commented 3 months ago

I think this is an issue specific to the tensorflow-macos package for M1 mac. Fortunately, with TensorFlow release 2.16, we no longer have to use the tensorflow-macos package.

The R package function tensorflow::install_tensorflow() has not been updated yet to install 2.16, but you can install the latest release (2.16.1 today) manually:

reticulate::virtualenv_create(
  "r-tensorflow", force = TRUE, python = ">=3.9,<=3.11", 
  packages = c("tensorflow", "tensorflow-metal")
)

~Then this succeeds without error:~

library(tensorflow)
## -------------------------------------------------------------------------
v <- tf$Variable(initial_value = tf$random$normal(shape(3, 1)))
v

## -------------------------------------------------------------------------
v$assign(tf$ones(shape(3, 1)))

## -------------------------------------------------------------------------
v[1, 1]$assign(3)
t-kalinowski commented 3 months ago

Actually, I spoke too soon. I still reproduce the error on the latest version if the Mac GPU is initialized and available to the TensorFlow session.

The issue is fundementally that the tensorflow-metal package doesn't implement a required Op for the GPU: ResourceStridedSliceAssign.

We'll have to wait until tensorflow-metal is updated (the latest today is 1.1.0): https://pypi.org/project/tensorflow-metal/

In the interim, you can disable the GPU or install a CPU-only version of tensorflow on M1 macs to avoid this issue:

Disable the GPU:

library(tensorflow)
tf$config$list_physical_devices("CPU") |> tf$config$set_visible_devices()

Install a CPU only version:

reticulate::virtualenv_create(
  "r-tensorflow", force = TRUE, python = ">=3.9,<=3.11", 
  packages = c("tensorflow") # don't install "tensorflow-metal"
)
jonbry commented 3 months ago

Ok, I'll switch over to the CPU. Thanks for your help!