InvalidArgumentError: Cannot assign a device for operation

dgoldenberg-audiomack commented 3 years ago

I'm running the below type of code on a GPU enabled machine and I get the below error. Any idea as to what might be causing it and/or how to get around it?

Versions: TF = v.2.3.1 TFRS = v.0.3.2 TF IO = v.0.16.0

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    tf.config.set_soft_device_placement(True)
    user_model = ...
    item_mode = ...
    model = TfrsModel(user_model, item_model, task, cached_train_event_ds, cached_test_event_ds)
    model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
    model.fit(model.cached_train_event_ds, epochs=num_epochs)

The exception + more of the log

>> TF built with Cuda? - True
>> TF built with GPU support? - True
2021-03-27 17:18:44.446952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-03-27 17:18:45.218565: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:18:45.219549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-27 17:18:45.219590: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-03-27 17:18:45.221375: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-03-27 17:18:45.222991: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-03-27 17:18:45.223305: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-03-27 17:18:45.225184: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-03-27 17:18:45.226397: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-03-27 17:18:45.231122: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-03-27 17:18:45.231236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:18:45.232195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:18:45.233092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
>> GPU physical devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2021-03-27T17:18:45.767+0000: [GC (Metadata GC Threshold) [PSYoungGen: 168929K->14357K(676864K)] 197891K->43327K(1438720K), 0.0108785 secs] [Times: user=0.03 sys=0.01, real=0.02 secs] 
2021-03-27T17:18:45.778+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 14357K->0K(676864K)] [ParOldGen: 28969K->31265K(1068032K)] 43327K->31265K(1744896K), [Metaspace: 93441K->93436K(1134592K)], 0.0677424 secs] [Times: user=0.14 sys=0.00, real=0.06 secs] 
2021-03-27T17:18:48.537+0000: [GC (Allocation Failure) [PSYoungGen: 654848K->21494K(670208K)] 686113K->56793K(1738240K), 0.0162617 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 

2021-03-27 17:25:57.200066: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-27 17:25:57.225955: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300020000 Hz
2021-03-27 17:25:57.226422: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x46fd6a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-27 17:25:57.226452: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-03-27 17:25:57.322558: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:57.323604: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2dfa0b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-27 17:25:57.323637: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2021-03-27 17:25:57.323847: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:57.324787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-27 17:25:57.324849: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-03-27 17:25:57.324896: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-03-27 17:25:57.324933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-03-27 17:25:57.324959: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-03-27 17:25:57.324983: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-03-27 17:25:57.325008: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-03-27 17:25:57.325032: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-03-27 17:25:57.325109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:57.326098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:57.326994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-03-27 17:25:57.327046: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-03-27 17:25:58.116489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-27 17:25:58.116545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-03-27 17:25:58.116558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-03-27 17:25:58.116766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:58.117766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-27 17:25:58.118726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14730 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)

>> Strategy: <tensorflow.python.distribute.mirrored_strategy.MirroredStrategy object at 0x7fefca532890>
>> Number of devices: 1
>> 2021-03-27 12:26:45 : >> Training the model...
Epoch 1/3
WARNING:tensorflow:From /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
Traceback (most recent call last):
  File "/mnt/tmp/spark-7d6effdf-14d4-40d3-b911-e0bd33db82ab/my-deps.zip/my-proto/my_tfrs_model.py", line 151, in train_and_evaluate
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation sequential/embedding/embedding_lookup/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/embedding/embedding_lookup/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU XLA_CPU XLA_GPU 
Cast: GPU CPU XLA_CPU XLA_GPU 
Const: GPU CPU XLA_CPU XLA_GPU 
ResourceSparseApplyAdagradV2: CPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_embedding_embedding_lookup_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adagrad_adagrad_update_1_update_0_resourcesparseapplyadagradv2_accum (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/embedding/embedding_lookup/ReadVariableOp (ReadVariableOp) 
  sequential/embedding/embedding_lookup/axis (Const) 
  sequential/embedding/embedding_lookup (GatherV2) 
  gradient_tape/sequential/embedding/embedding_lookup/Shape (Const) 
  gradient_tape/sequential/embedding/embedding_lookup/Cast (Cast) 
  Adagrad/Adagrad/update_1/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0

     [[{{node sequential/embedding/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_1654]

maciejkula commented 3 years ago

This is because Adagrad for embeddings isn't implemented on GPUs.

maciejkula commented 3 years ago

I am talking to the TensorFlow folks to address this, but unfortunately there are a lot of competing priorities and it's not clear when this will be fixed.

dgoldenberg-audiomack commented 3 years ago

Hi @maciejkula , appreciate you looking into this. Understood re: priorities. This is not a blocker here, can work around this for now, thanks.

sanjoy commented 3 years ago

@dgoldenberg-audiomack Does this problem reproduce in tf-nightly? NVIDIA added a GPU kernel for ResourceSparseApplyAdagradV2 in January, so it is possible that this just works now.

HurtaClaudio commented 2 years ago

Any news on this bug?

tensorflow / recommenders

InvalidArgumentError: Cannot assign a device for operation #269