tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.45k stars 74.32k forks source link

Aborted (core dumped) in `tf.raw_ops.ResourceScatterNdop` #76729

Open x0w3n opened 1 month ago

x0w3n commented 1 month ago

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.18.0-dev20240925

Custom code

Yes

OS platform and distribution

Linux Ubuntu 22.04.3 LTS (x86_64)

Mobile device

No response

Python version

3.9.13

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

When the type of resource_handle is inconsistent with that of updates,tf.raw_ops.ResourceScatterNdop triggers the crash. As follows: tf.raw_ops.ResourceScatterNdUpdate tf.raw_ops.ResourceScatterNdAdd tf.raw_ops.ResourceScatterNdSub tf.raw_ops.ResourceScatterNdMax tf.raw_ops.ResourceScatterNdMin

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

resource_var = tf.Variable(initial_value=tf.zeros([2, 2], dtype=tf.int32), trainable=False)
resource_handle = resource_var.handle

indices = np.array([[2, 1], [1, 2]], dtype=np.int32)
updates = np.array([10, 20], dtype=np.float32)
tf.raw_ops.ResourceScatterNdUpdate(  # crash
    ref=resource_handle,
    indices=indices,
    updates=updates,
    use_locking=True
)

tf.raw_ops.ResourceScatterNdAdd(  # crash
    ref=resource_handle,
    indices=indices,
    updates=updates,
    use_locking=True
)
tf.raw_ops.ResourceScatterNdSub(  # crash
    ref=resource_handle,
    indices=indices,
    updates=updates,
    use_locking=True
)
tf.raw_ops.ResourceScatterNdMax(  # crash
    ref=resource_handle,
    indices=indices,
    updates=updates,
    use_locking=True
)
tf.raw_ops.ResourceScatterNdMin(  # crash
    ref=resource_handle,
    indices=indices,
    updates=updates,
    use_locking=True
)

Relevant log output

2024-09-28 21:06:23.445185: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-28 21:06:23.508056: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-28 21:06:23.583640: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-28 21:06:23.607538: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-28 21:06:23.664877: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-28 21:06:31.527466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3114 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:1f:00.0, compute capability: 8.9
2024-09-28 21:06:31.527985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 1724 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:d4:00.0, compute capability: 8.9
2024-09-28 21:06:31.782114: F tensorflow/core/framework/tensor.cc:844] Check failed: dtype() == expected_dtype (3 vs. 1) float expected, got int32
Aborted (core dumped)
Venkat6871 commented 1 month ago

I tried running your code on Colab using TensorFlow v2.17.0 and the nightly version. I faced the same issue. Please find gist here for reference. Thank you!