microsoft / tensorflow-directml

Fork of TensorFlow accelerated by DirectML
Apache License 2.0
457 stars 32 forks source link

Cannot assign a device for operation embedding/embeddings/Initializer/random_uniform/ #379

Open KaganSenturk opened 2 years ago

KaganSenturk commented 2 years ago

System Information

Hi, I have AMD GPU on my local machine and I want to train the LSTM model that requires TensorFlow. Firstly, by using TensorFlow-directML, the machine can detect GPU in the system. Code and results are below;

**from tensorflow.python.client import device_lib device_lib.list_local_devices()

[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5162271997438626014, name: "/device:DML:0" device_type: "DML" memory_limit: 6797208279 locality { } incarnation: 12883817374713471833 _physical_device_desc: "{\"name\": \"AMD Radeon Pro V520 MxGPU\", \"vendor_id\": 4098, \"device_id\": 29538, \"driverversion\": \"27.20.11025.4019\"}"]

Nothing a problem so far. But while training the model, is there any stage we need to activate this GPU? I am getting this error. Without GPU, the model starts running and I can see epoch stage. But it is a bit complex therefore I takes to time to get a result. GPU can be detected by tensorflow but while training the model device problem occurred. Can you guess what is the problem?

nvalidArgumentError: Cannot assign a device for operation embedding/embeddings/Initializer/random_uniform/sub: Could not satisfy explicit device specification '' because the node node embedding/embeddings/Initializer/random_uniform/sub (defined at C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) placed on device Device assignments active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation: with tf.device(None): <C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1535> was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:DML:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:DML:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=1 requested_devicename='/job:localhost/replica:0/task:0/device:DML:0' assigned_devicename='/job:localhost/replica:0/task:0/device:DML:0' resource_devicename='/job:localhost/replica:0/task:0/device:DML:0' supported_devicetypes=[CPU] possibledevices=[] Add: DML CPU Const: DML CPU RandomUniform: DML CPU Sub: DML CPU Mul: DML CPU Sqrt: DML CPU VarHandleOp: DML CPU AssignVariableOp: DML CPU VarIsInitializedOp: DML CPU ReadVariableOp: DML CPU ResourceGather: DML CPU Identity: DML CPU ResourceScatterAdd: DML CPU Fill: DML CPU Shape: DML CPU Unique: DML CPU StridedSlice: DML CPU UnsortedSegmentSum: CPU AddV2: DML CPU RealDiv: DML CPU AssignSubVariableOp: DML CPU NoOp: DML CPU

Colocation members, user-requested devices, and framework assigned devices, if any: embedding/embeddings/Initializer/random_uniform/shape (Const) embedding/embeddings/Initializer/random_uniform/min (Const) embedding/embeddings/Initializer/random_uniform/max (Const) embedding/embeddings/Initializer/random_uniform/RandomUniform (RandomUniform) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embeddings/Initializer/random_uniform/sub (Sub) embedding/embeddings/Initializer/random_uniform/mul (Mul) embedding/embeddings/Initializer/random_uniform (Add) embedding/embeddings (VarHandleOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embeddings/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embeddings/Assign (AssignVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embeddings/Read/ReadVariableOp (ReadVariableOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embedding_lookup (ResourceGather) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 embedding/embedding_lookup/Identity (Identity) VarIsInitializedOp (VarIsInitializedOp) framework assigned device=/job:localhost/replica:0/task:0/device:DML:0 training/Adam/embedding/embeddings/m/Initializer/zeros/shape_as_tensor (Const) training/Adam/embedding/embeddings/m/Initializer/zeros/Const (Const) training/Adam/embedding/embeddings/m/Initializer/zeros (Fill) training/Adam/embedding/embeddings/m (VarHandleOp) training/Adam/embedding/embeddings/m/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) training/Adam/embedding/embeddings/m/Assign (AssignVariableOp) training/Adam/embedding/embeddings/m/Read/ReadVariableOp (ReadVariableOp) training/Adam/embedding/embeddings/v/Initializer/zeros/shape_as_tensor (Const) training/Adam/embedding/embeddings/v/Initializer/zeros/Const (Const) training/Adam/embedding/embeddings/v/Initializer/zeros (Fill) training/Adam/embedding/embeddings/v (VarHandleOp) training/Adam/embedding/embeddings/v/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) training/Adam/embedding/embeddings/v/Assign (AssignVariableOp) training/Adam/embedding/embeddings/v/Read/ReadVariableOp (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/Unique (Unique) training/Adam/Adam/update_embedding/embeddings/Shape (Shape) training/Adam/Adam/update_embedding/embeddings/strided_slice/stack (Const) training/Adam/Adam/update_embedding/embeddings/strided_slice/stack_1 (Const) training/Adam/Adam/update_embedding/embeddings/strided_slice/stack_2 (Const) training/Adam/Adam/update_embedding/embeddings/strided_slice (StridedSlice) training/Adam/Adam/update_embedding/embeddings/UnsortedSegmentSum (UnsortedSegmentSum) training/Adam/Adam/update_embedding/embeddings/mul (Mul) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/mul_1 (Mul) training/Adam/Adam/update_embedding/embeddings/AssignVariableOp (AssignVariableOp) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_1 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/ResourceScatterAdd (ResourceScatterAdd) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_2 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/mul_2 (Mul) training/Adam/Adam/update_embedding/embeddings/mul_3 (Mul) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_3 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/mul_4 (Mul) training/Adam/Adam/update_embedding/embeddings/AssignVariableOp_1 (AssignVariableOp) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_4 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/ResourceScatterAdd_1 (ResourceScatterAdd) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_5 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/Sqrt (Sqrt) training/Adam/Adam/update_embedding/embeddings/mul_5 (Mul) training/Adam/Adam/update_embedding/embeddings/add (AddV2) training/Adam/Adam/update_embedding/embeddings/truediv (RealDiv) training/Adam/Adam/update_embedding/embeddings/AssignSubVariableOp (AssignSubVariableOp) training/Adam/Adam/update_embedding/embeddings/ReadVariableOp_6 (ReadVariableOp) training/Adam/Adam/update_embedding/embeddings/group_deps (NoOp) VarIsInitializedOp_19 (VarIsInitializedOp) VarIsInitializedOp_37 (VarIsInitializedOp)

 [[node embedding/embeddings/Initializer/random_uniform/sub (defined at C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]]Additional information about colocations:No node-device colocations were active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation.

Device assignments active during op 'embedding/embeddings/Initializer/random_uniform/sub' creation: with tf.device(None): <C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1535>

Original stack trace for 'embedding/embeddings/Initializer/random_uniform/sub': File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel_launcher.py", line 16, in app.launch_new_instance() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\traitlets\config\application.py", line 664, in launch_instance app.start() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelapp.py", line 612, in start self.io_loop.start() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\platform\asyncio.py", line 199, in start self.asyncio_loop.run_forever() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\base_events.py", line 442, in run_forever self._run_once() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\base_events.py", line 1462, in _run_once handle._run() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\asyncio\events.py", line 145, in _run self._callback(self._args) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\ioloop.py", line 688, in lambda f: self._run_callback(functools.partial(callback, future)) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\ioloop.py", line 741, in _run_callback ret = callback() File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 814, in inner self.ctx_run(self.run) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run return f(args, kw) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 775, in run yielded = self.gen.send(value) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 365, in process_one yield gen.maybe_future(dispatch(args)) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run return f(args, kw) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 268, in dispatch_shell yield gen.maybe_future(handler(stream, idents, msg)) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run return f(*args, kw) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\kernelbase.py", line 545, in execute_request user_expressions, allow_stdin, File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 234, in wrapper yielded = ctx_run(next, result) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tornado\gen.py", line 162, in _fake_ctx_run return f(*args, *kw) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\ipkernel.py", line 306, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\ipykernel\zmqshell.py", line 536, in run_cell return super(ZMQInteractiveShell, self).run_cell(args, kwargs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 2867, in run_cell raw_cell, store_history, silent, shell_futures) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 2895, in _run_cell return runner(coro) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\async_helpers.py", line 68, in _pseudo_sync_runner coro.send(None) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3072, in run_cell_async interactivity=interactivity, compiler=compiler, result=result) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3263, in run_ast_nodes if (await self.runcode(code, result, async=asy)): File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in concat_lstm = get_model1(tf_idf_train,X_meta_train, results,embedding_dimensions) File "", line 17, in get_model1 mask_zero=True)(tf_idf_input) # Use masking to handle the variable sequence lengths File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 824, in call self._maybe_build(inputs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 2146, in _maybe_build self.build(input_shapes) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\utils\tf_utils.py", line 306, in wrapper output_shape = fn(instance, input_shape) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\layers\embeddings.py", line 146, in build constraint=self.embeddings_constraint) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 529, in add_weight aggregation=aggregation) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\training\tracking\base.py", line 712, in _add_variable_with_custom_getter kwargs_for_getter) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py", line 139, in make_variable shape=variable_shape if variable_shape else None) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 258, in call return cls._variable_v1_call(*args, kwargs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 219, in _variable_v1_call shape=shape) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 197, in previous_getter = lambda kwargs: default_variable_creator(None, *kwargs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variable_scope.py", line 2503, in default_variable_creator shape=shape) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\variables.py", line 262, in call return super(VariableMetaclass, cls).call(args, kwargs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py", line 1406, in init distribute_strategy=distribute_strategy) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py", line 1537, in _init_from_args initial_value() if init_from_fn else initial_value, File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\keras\engine\base_layer_utils.py", line 119, in init_val = lambda: initializer(shape, dtype=dtype) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\init_ops.py", line 283, in call shape, self.minval, self.maxval, dtype, seed=self.seed) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\random_ops.py", line 246, in random_uniform result = math_ops.add(rnd (maxval - minval), minval, name=name) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 899, in binary_op_wrapper return func(x, y, name=name) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 11926, in sub "Sub", x=x, y=y, name=name) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func return func(args, **kwargs) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3371, in create_op attrs, op_def, compute_device) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3440, in _create_op_internal op_def=op_def) File "C:\Users\kagan.senturk\Anaconda3\envs\tfradeon\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1762, in init self._traceback = tf_stack.extract_stack()

PatriceVignola commented 2 years ago

The problem is that UnsortedSegmentSum hasn't been enabled for DML devices. We actually have an implementation already there, but we noticed that it was using way too much memory so we disabled it to revisit it at a later time:

https://github.com/microsoft/tensorflow-directml/blob/a654e7cdb9bedcbf0611c1268d03910be0d095d2/tensorflow/core/kernels/dml_segment_reduction_ops.cc#L215

A solution here would be to explicitly place RandomUniform on the CPU instead of letting it go to DML. Random ops are not usually executed that often during one epoch, so it shouldn't affect performance too much.

Saipavan790 commented 2 years ago

I am also facing the same issue. Were you able to resolve it?

PatriceVignola commented 2 years ago

@Saipavan790 We have a few solutions in mind that seem to be working internally, but if you or @KaganSenturk have a sample model that we can try running against to test for accuracy and performance, it would help making sure that your specific scenarios are covered.

KaganSenturk commented 2 years ago

I will switch to NVIDIA drivers. that's my solution.

PatriceVignola commented 2 years ago

The latest release now has support for the UnsortedSegment* ops. We're currently working on optimizing it as we speak, so it will get faster in the next version. But at least, for the time being, there shouldn't be device placement errors anymore.

Note that most of the latest developments are happening over at the tensorflow-directml-plugin fork and its corresponding pypi package, which are for TensorFlow >= 2.10.