QKeras: a quantization deep learning library for Tensorflow Keras
Apache License 2.0
541
stars
104
forks
source link
When I use QKeras: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs] #104
I am trying to import a QKeras model, but when I do this I got the following error:
`
2022-11-09 17:35:41.945057: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-11-09 17:35:51.418181: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-09 17:35:51.422273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-11-09 17:35:51.533605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
...
2022-11-09 17:35:52.242327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-11-09 17:35:52.255910: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-09 17:35:52.264507: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-09 17:35:52.268725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
2022-11-09 17:35:52.269218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library ...
2022-11-09 17:35:52.278901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-11-09 17:35:52.280058: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-11-09 17:41:37.635546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-11-09 17:41:37.636234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2022-11-09 17:41:37.637247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2022-11-09 17:41:37.649442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: A100-PCIE-40GB, pci bus id: 0000:81:00.0, compute capability: 8.0)
2022-11-09 17:41:37.871970: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
1 Physical GPUs, 1 Logical GPUs
[info]Starting test!
Traceback (most recent call last):
File "./conda-qkeras/similarity_study/01_get_data_model/01_test.py", line 38, in
model = qkeras_utils.load_qmodel(model_dir)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/utils.py", line 928, in load_qmodel
qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 183, in load_model_from_hdf5
model = model_config_lib.model_from_config(model_config,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/model_config.py", line 64, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/serialization.py", line 173, in deserialize
return generic_utils.deserialize_keras_object(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 354, in deserialize_keras_object
return cls.from_config(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 668, in from_config
input_tensors, output_tensors, created_layers = reconstruct_from_config(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1285, in reconstruct_from_config
process_node(layer, node_data)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1233, in process_node
output_tensors = layer(input_tensors, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 951, in call
return self._functional_construction_call(inputs, args, kwargs,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1090, in _functional_construction_call
outputs = self._keras_tensor_symbolic_call(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2710, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/convolutional.py", line 198, in build
self.kernel = self.add_weight(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 623, in add_weight
variable = self._add_variable_with_custom_getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 805, in _add_variable_with_custom_getter
new_variable = getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 130, in make_variable
return tf_variables.VariableV1(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 260, in call
return cls._variable_v1_call(*args, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 206, in _variable_v1_call
return previous_getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 199, in
previous_getter = lambda kwargs: default_variable_creator(None, *kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variable_scope.py", line 2604, in default_variable_creator
return resource_variable_ops.ResourceVariable(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 264, in call
return super(VariableMetaclass, cls).call(args, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1574, in init
self._init_from_args(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1712, in _init_from_args
initial_value = initial_value()
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/qlayers.py", line 105, in call
max_x = np.max(abs(x))
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs
return gen_math_ops._abs(x, name=name)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/gen_math_ops.py", line 46, in _abs
_ops.raise_from_not_ok_status(e, name)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs]
`
This is something that I only experience with QKeras. If I load the same model with Keras I got an "I don't recognize QConv layer" but I don't see this error. If I try to load a compatible-keras model then no error ocurs. That why I am assuming that the error comes from QKeras
I am using the following code:
`
import tensorflow as tf
from tensorflow import keras
from qkeras import *
from qkeras import utils as qkeras_utils
from tensorflow import keras
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
enable_gpu = 1
if enable_gpu:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
Restrict TensorFlow to only allocate 1GB of memory on the first GPU
try:
tf.config.set_logical_device_configuration(
gpus[0],
[tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
if len(sys.argv) > 3:
print("Model file dir needed and num samples! Exiting")
sys.exit()
Hello,
I am trying to import a QKeras model, but when I do this I got the following error:
` 2022-11-09 17:35:41.945057: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2022-11-09 17:35:51.418181: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-11-09 17:35:51.422273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2022-11-09 17:35:51.533605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0 coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s ... 2022-11-09 17:35:52.242327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-11-09 17:35:52.255910: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-11-09 17:35:52.264507: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-11-09 17:35:52.268725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:81:00.0 name: A100-PCIE-40GB computeCapability: 8.0 coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s 2022-11-09 17:35:52.269218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library ... 2022-11-09 17:35:52.278901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-11-09 17:35:52.280058: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2022-11-09 17:41:37.635546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-11-09 17:41:37.636234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2022-11-09 17:41:37.637247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2022-11-09 17:41:37.649442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1024 MB memory) -> physical GPU (device: 0, name: A100-PCIE-40GB, pci bus id: 0000:81:00.0, compute capability: 8.0) 2022-11-09 17:41:37.871970: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device 1 Physical GPUs, 1 Logical GPUs [info]Starting test! Traceback (most recent call last): File "./conda-qkeras/similarity_study/01_get_data_model/01_test.py", line 38, in
model = qkeras_utils.load_qmodel(model_dir)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/utils.py", line 928, in load_qmodel
qmodel = tf.keras.models.load_model(filepath, custom_objects=custom_objects,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 183, in load_model_from_hdf5
model = model_config_lib.model_from_config(model_config,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/model_config.py", line 64, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/serialization.py", line 173, in deserialize
return generic_utils.deserialize_keras_object(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 354, in deserialize_keras_object
return cls.from_config(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 668, in from_config
input_tensors, output_tensors, created_layers = reconstruct_from_config(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1285, in reconstruct_from_config
process_node(layer, node_data)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1233, in process_node
output_tensors = layer(input_tensors, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 951, in call
return self._functional_construction_call(inputs, args, kwargs,
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1090, in _functional_construction_call
outputs = self._keras_tensor_symbolic_call(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2710, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/convolutional.py", line 198, in build
self.kernel = self.add_weight(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 623, in add_weight
variable = self._add_variable_with_custom_getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 805, in _add_variable_with_custom_getter
new_variable = getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 130, in make_variable
return tf_variables.VariableV1(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 260, in call
return cls._variable_v1_call(*args, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 206, in _variable_v1_call
return previous_getter(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 199, in
previous_getter = lambda kwargs: default_variable_creator(None, *kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variable_scope.py", line 2604, in default_variable_creator
return resource_variable_ops.ResourceVariable(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 264, in call
return super(VariableMetaclass, cls).call(args, kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1574, in init
self._init_from_args(
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1712, in _init_from_args
initial_value = initial_value()
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/qlayers.py", line 105, in call
max_x = np.max(abs(x))
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs
return gen_math_ops._abs(x, name=name)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/gen_math_ops.py", line 46, in _abs
_ops.raise_from_not_ok_status(e, name)
File "./miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs]
` This is something that I only experience with QKeras. If I load the same model with Keras I got an "I don't recognize QConv layer" but I don't see this error. If I try to load a compatible-keras model then no error ocurs. That why I am assuming that the error comes from QKeras
I am using the following code: ` import tensorflow as tf from tensorflow import keras from qkeras import * from qkeras import utils as qkeras_utils from tensorflow import keras import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
enable_gpu = 1
if enable_gpu: gpus = tf.config.list_physical_devices('GPU') if gpus:
Restrict TensorFlow to only allocate 1GB of memory on the first GPU
if len(sys.argv) > 3: print("Model file dir needed and num samples! Exiting") sys.exit()
model_dir = sys.argv [1] num_samples = sys.argv [2] print("[info]Starting test!")
Loading model
model = qkeras_utils.load_qmodel(model_dir)
model = keras.models.load_model(model_dir)
model.summary()
Perform inference
exit()
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples')
Convert and pre-processing
y_train = np_utils.to_categorical(y_train, num_classes) y_test = np_utils.to_categorical(y_test, num_classes) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255
for i in num_samples: predictions = model.predict(x_test[i])
print("End") ` My versions are the following: cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
... tensorflow 2.4.1 gpu_py39h8236f22_0
tensorflow-base 2.4.1 gpu_py39h29c2da4_0
tensorflow-datasets 4.6.0 pypi_0 pypi tensorflow-estimator 2.6.0 pyh7b7c402_0
tensorflow-gpu 2.4.1 h30adc30_0
tensorflow-metadata 1.9.0 pypi_0 pypi tensorflow-model-optimization 0.7.3 pypi_0 pypi