Using the plugin with Tensorflow Recommenders.

Abdelrahman-7gab commented 1 year ago

Hello, I'm trying to use DirectML with TFRS library. the model works with Tensorflow-CPU and with Cuda without issues, however when trying this DirectML plugin on an AMD GPU

I get the following error

2022-09-23 17:08:11.821401: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll
2022-09-23 17:08:11.822066: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll 
2022-09-23 17:08:11.823965: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll
2022-09-23 17:08:11.970227: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
2022-09-23 17:08:21.035934: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-23 17:08:21.036666: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (AMD Radeon RX 6600)
2022-09-23 17:08:21.100537: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll
2022-09-23 17:08:21.103005: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-09-23 17:08:21.103449: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device.
2022-09-23 17:08:21.103645: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6935 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)
uniqye items 130
2022-09-23 17:08:31.666967: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:31.703858: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.110711: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.146635: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.526733: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-09-23 17:08:32.554012: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
Epoch 1/200
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. Received: inputs={'temp': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=int32>, 'humidity': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=int32>, 'timeStamp': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=float32>, 'customerId': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>, 'hourCategory': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>}. Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. Received: inputs={'temp': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=int32>, 'humidity': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=int32>, 'timeStamp': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=float32>, 'customerId': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>, 'hourCategory': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>}. Consider rewriting this model with the Functional API.
Traceback (most recent call last):
  File "f:/tensorflow_recommender/ContextRetrievalTrain.py", line 238, in <module>
    trainContextRetrieval()
  File "f:/tensorflow_recommender/ContextRetrievalTrain.py", line 212, in trainContextRetrieval
    model.fit(cached_train, epochs=200)
  File "C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\7gab\anaconda3\envs\tfRecommenders\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation sequential_6/user_model/sequential_1/embedding_1/embedding_lookup: Could not satisfy explicit device specification '' 
because the node {{colocation_node sequential_6/user_model/sequential_1/embedding_1/embedding_lookup}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
StridedSlice: CPU
Unique: GPU CPU
Shape: GPU CPU
_Arg: GPU CPU
ResourceGather: GPU CPU
Identity: GPU CPU
Const: GPU CPU
UnsortedSegmentSum: CPU
ResourceSparseApplyAdagradV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_6_user_model_sequential_1_embedding_1_embedding_lookup_5809 (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adagrad_adagrad_update_resourcesparseapplyadagradv2_accum (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential_6/user_model/sequential_1/embedding_1/embedding_lookup (ResourceGather)
  sequential_6/user_model/sequential_1/embedding_1/embedding_lookup/Identity (Identity)
  Adagrad/Adagrad/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0
  Adagrad/Adagrad/update/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node sequential_6/user_model/sequential_1/embedding_1/embedding_lookup}}]] [Op:__inference_train_function_6193]

while trying to train the retrieval model.

I know that this plugin is still not in stable state, but I'm just making sure if this will work / be supported in the future or should I just depend on Tensorflow-cpu / Nvidia GPU?

PatriceVignola commented 1 year ago

We're working on those, stay tuned!

PatriceVignola commented 1 year ago

Hi @Abdelrahman-7gab,

We just released version 0.1.0.dev220928, which should fix this issue.

Abdelrahman-7gab commented 1 year ago

@PatriceVignola I tried the newest version and the training is working flawlessly! thank you for your help and for your time!

microsoft / tensorflow-directml-plugin

Using the plugin with Tensorflow Recommenders. #288