Open Dbenjamy opened 2 years ago
Thanks for bringing this to our attention! We'll look into a way to avoid this default behavior for pluggable devices and follow up here with any updates.
@Dbenjamy I managed to disable the use of CuDNN with the sample code on the same page but under "Using CuDNN kernels when available", changing the line
model = build_model(allow_cudnn_kernel=True)
into
model = build_model(allow_cudnn_kernel=False)
Unfortunately, at the subsequent call of model.fit(...)
the program dies after the first iteration with another error:
PS Z:\Projects\Tensorflow-test\mnist-rnn> .\main.py
2022-09-16 17:42:24.808496: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\-\AppData\Roaming\Python\Python310\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-09-16 17:42:24.809742: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll
2022-09-16 17:42:24.813393: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-09-16 17:42:24.967021: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2022-09-16 17:42:26.720466: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-16 17:42:26.721149: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Intel(R) HD Graphics 520)
2022-09-16 17:42:26.807836: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-09-16 17:42:26.809218: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-09-16 17:42:26.809510: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-09-16 17:42:26.810456: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6997 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-09-16 17:42:26.933651: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-09-16 17:42:26.933969: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6997 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
2022-09-16 17:42:26.936880: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-09-16 17:42:28.883714: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
1/938 [..............................] - ETA: 42:12 - loss: 2.7705 - accuracy: 0.0469Traceback (most recent call last):
File "Z:\Projects\Tensorflow-test\mnist-rnn\main.py", line 51, in <module>
model.fit(
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Program Files\Python310\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'gradient_tape/sequential/rnn/while/gradients/sequential/rnn/while/lstm_cell/mul_2_grad/BroadcastGradientArgs' defined at (most recent call last):
File "Z:\Projects\Tensorflow-test\mnist-rnn\main.py", line 51, in <module>
model.fit(
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1051, in train_function
return step_function(self, iterator)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 1030, in run_step
outputs = model.train_step(data)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 537, in minimize
grads_and_vars = self._compute_gradients(
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 590, in _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
File "C:\Users\-\AppData\Roaming\Python\Python310\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 471, in _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradient_tape/sequential/rnn/while/gradients/sequential/rnn/while/lstm_cell/mul_2_grad/BroadcastGradientArgs'
Incompatible shapes: [0,0] vs. [64,64]
[[{{node gradient_tape/sequential/rnn/while/gradients/sequential/rnn/while/lstm_cell/mul_2_grad/BroadcastGradientArgs}}]] [Op:__inference_train_function_1986]
PS Z:\Projects\Tensorflow-test\mnist-rnn>
On the other hand, after uninstalling tensorflow-directml-plugin everything works fine, with or without allow_cudnn_kernel=False
(obviously, as I'm running tensorflow-CPU).
Note: my GPU is a Intel HD Graphics 520.
same problem here. but if you uninstall tensorflow-directml-plugin, is it defeating the purpose of using directML by only using CPU? not sure if it relates to @Dbenjamy's AMD GPU as we've be using intel HD graphics GPUs.
I experienced the same problem. I am trying to run keras-ocr through the directml plugin, but it fails with the same error. Any updates on the issue?
Hi, this issue is on our radar but we don't currently have an ETA for the fix yet. Please refer to the workarounds suggested by @Dbenjamy for the time being, and we will post here when we do have updates. Thanks for your patience!
@maggie1059 Any update on this issue? Thanks for the work you are doing.
It is rather unfortuante that there is still no fix for this, because finding the workaround mentioned in this issue takes quite some time...
we need to fix this, I'm not gonna buy a NVIDIA
Temporary workaround:
Just came across This comment on the TensorFlow Repo. In the comments, a user suggests using tf.compat.v1.keras.layers
rather than tf.keras.layers
. In my case:
I was having the same issue. I needed to use
tf.compat.v1.keras.layers.GRU(128, return_sequences=True)
rather thankeras.layers.GRU(128, return_sequences=True)
when defining my model:model = tf.keras.Sequential([ tf.keras.layers.Embedding(input_dim=32, output_dim=64, input_length=32), tf.keras.layers.Bidirectional(tf.compat.v1.keras.layers.GRU(128, return_sequences=True)), tf.keras.layers.Dense(32, activation='softmax') ])
From my understanding this uses the layer from TensorFlow v1 hence why i say temporary. Please correct Me as needed. I plan to try this fix on my AMD GPU in a week or so and will update this then.
Temporary workaround:
Just came across This comment on the TensorFlow Repo. In the comments, a user suggests using
tf.compat.v1.keras.layers
rather thantf.keras.layers
. In my case:I was having the same issue. I needed to use
tf.compat.v1.keras.layers.GRU(128, return_sequences=True)
rather thankeras.layers.GRU(128, return_sequences=True)
when defining my model:model = tf.keras.Sequential([ tf.keras.layers.Embedding(input_dim=32, output_dim=64, input_length=32), tf.keras.layers.Bidirectional(tf.compat.v1.keras.layers.GRU(128, return_sequences=True)), tf.keras.layers.Dense(32, activation='softmax') ])
From my understanding this uses the layer from TensorFlow v1 hence why i say temporary. Please correct Me as needed. I plan to try this fix on my AMD GPU in a week or so and will update this then.
Still working as of this date. I tried using this method from what ur post shared on LSTM and also GRU, both still works. on TensorFlow 2.10 version and AMD GPU
When running the example RNN notebook from tensorflow, I got the following error:
After looking around, I found that Microsoft's installation tutorial mentions this in a note in step 5:
The notebook example says the built-in layers like
keras.layers.LSTM(*args)
use CuDNN kernals by default, so I used wrapped-cell layers instead,keras.layers.RNN(keras.layers.LSTMCell(*args), *args)
, which worked just fine and uses my GPU.It looks like the built-in layers are looking for a compatible CuDNN GPU. Since I have an AMD GPU, it fails. When I use the wrapped-cell layers, it doesn't make that assumption and the
tensorflow-directml-plugin
is able to work as intended.I was able to get the built-in layer to use the generic GPU kernel by making it fail the requirements for CuDNN (i.e. I set
activation='sigmoid'
instead oftanh
) and it was able to use my GPU using the directml-plugin, but that seems like a strange workaround since the point of built-in layers is to be simple and I might like the default configuration. I wasn't able to find an option to make the built-in layers use a generic GPU kernel, aside from making them fail the criteria. If there is a better way feel free to let me know.Do you know if there are plans to update the built-in layers or add a workaround, or is the plan to use wrapped-cell layers as the main workaround?