google / qkeras

QKeras: a quantization deep learning library for Tensorflow Keras
Apache License 2.0
533 stars 102 forks source link

When I use QKeras: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs] #104

Closed laumecha closed 1 year ago

laumecha commented 1 year ago

Hello,

I am trying to import a QKeras model, but when I do this I got the following error:

` 2022-11-09 17:35:41.945057: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2022-11-09 17:35:51.418181: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-11-09 17:35:51.422273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2022-11-09 17:35:51.533605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0 coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s ... 2022-11-09 17:35:52.242327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-11-09 17:35:52.255910: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-09 17:35:52.264507: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-11-09 17:35:52.268725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0 coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s 2022-11-09 17:35:52.269218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library ... 2022-11-09 17:35:52.278901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-11-09 17:35:52.280058: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2022-11-09 17:41:37.635546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-11-09 17:41:37.636234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2022-11-09 17:41:37.637247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2022-11-09 17:41:37.649442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: A100-PCIE-40GB, pci bus id: 0000:81:00.0, compute capability: 8.0) 2022-11-09 17:41:37.871970: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device 1 Physical GPUs, 1 Logical GPUs [info]Starting test! Traceback (most recent call last): File "./conda-qkeras/similarity_study/01_get_data_model/01_test.py", line 38, in model = qkeras_utils.load_qmodel(model_dir) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/utils.py", line 928, in load_qmodel qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects, File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 183, in load_model_from_hdf5 model = model_config_lib.model_from_config(model_config, File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/model_config.py", line 64, in model_from_config return deserialize(config, custom_objects=custom_objects) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/serialization.py", line 173, in deserialize return generic_utils.deserialize_keras_object( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 354, in deserialize_keras_object return cls.from_config( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 668, in from_config input_tensors, output_tensors, created_layers = reconstruct_from_config( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1285, in reconstruct_from_config process_node(layer, node_data) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1233, in process_node output_tensors = layer(input_tensors, kwargs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 951, in call return self._functional_construction_call(inputs, args, kwargs, File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1090, in _functional_construction_call outputs = self._keras_tensor_symbolic_call( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call return self._infer_output_signature(inputs, args, kwargs, input_masks) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature self._maybe_build(inputs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2710, in _maybe_build self.build(input_shapes) # pylint:disable=not-callable File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/convolutional.py", line 198, in build self.kernel = self.add_weight( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 623, in add_weight variable = self._add_variable_with_custom_getter( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 805, in _add_variable_with_custom_getter new_variable = getter( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 130, in make_variable return tf_variables.VariableV1( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 260, in call return cls._variable_v1_call(*args, kwargs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 206, in _variable_v1_call return previous_getter( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 199, in previous_getter = lambda kwargs: default_variable_creator(None, *kwargs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variable_scope.py", line 2604, in default_variable_creator return resource_variable_ops.ResourceVariable( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 264, in call return super(VariableMetaclass, cls).call(args, kwargs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1574, in init self._init_from_args( File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1712, in _init_from_args initial_value = initial_value() File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/qlayers.py", line 105, in call max_x = np.max(abs(x)) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs return gen_math_ops._abs(x, name=name) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/gen_math_ops.py", line 46, in _abs _ops.raise_from_not_ok_status(e, name) File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs]

` This is something that I only experience with QKeras. If I load the same model with Keras I got an "I don't recognize QConv layer" but I don't see this error. If I try to load a compatible-keras model then no error ocurs. That why I am assuming that the error comes from QKeras

I am using the following code: ` import tensorflow as tf from tensorflow import keras from qkeras import * from qkeras import utils as qkeras_utils from tensorflow import keras import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

enable_gpu = 1

if enable_gpu: gpus = tf.config.list_physical_devices('GPU') if gpus:

Restrict TensorFlow to only allocate 1GB of memory on the first GPU

try:
  tf.config.set_logical_device_configuration(
      gpus[0],
      [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
  logical_gpus = tf.config.list_logical_devices('GPU')
  print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
  # Virtual devices must be set before GPUs have been initialized
  print(e)

if len(sys.argv) > 3: print("Model file dir needed and num samples! Exiting") sys.exit()

model_dir = sys.argv [1] num_samples = sys.argv [2] print("[info]Starting test!")

Loading model

model = qkeras_utils.load_qmodel(model_dir)

model = keras.models.load_model(model_dir)

model.summary()

Perform inference

exit()

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples')

Convert and pre-processing

y_train = np_utils.to_categorical(y_train, num_classes) y_test = np_utils.to_categorical(y_test, num_classes) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255

for i in num_samples: predictions = model.predict(x_test[i])

print("End") ` My versions are the following: cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
... tensorflow 2.4.1 gpu_py39h8236f22_0
tensorflow-base 2.4.1 gpu_py39h29c2da4_0
tensorflow-datasets 4.6.0 pypi_0 pypi tensorflow-estimator 2.6.0 pyh7b7c402_0
tensorflow-gpu 2.4.1 h30adc30_0
tensorflow-metadata 1.9.0 pypi_0 pypi tensorflow-model-optimization 0.7.3 pypi_0 pypi

laumecha commented 1 year ago

Solved. It was because of the CUDA version.

laumecha commented 1 year ago

Solved. The problem was the version of CUDA.