google / qkeras

QKeras: a quantization deep learning library for Tensorflow Keras
Apache License 2.0
533 stars 102 forks source link

Very low accuracy following AutoQKeras notebook and CUDA error #102

Closed laumecha closed 1 year ago

laumecha commented 1 year ago

I am following this notebook: https://github.com/google/qkeras/blob/master/notebook/AutoQKeras.ipynb However, the notebook says that for the unquantized model for the mnsit dataset, we should get 99% accuracy. However, after 10 epochs, I am getting 11.38%. I have tried with more epochs, but I always get the same accuracy.

For mnist, we should get 99% validation accuracy, and for fashion_mnist, we should get around 86% of validation accuracy. Let's get a metric for high-level estimation of energy of this model.

My epoch 10 accuracy output: `Epoch 10/100 29/29 [==============================] - 0s 11ms/step - loss: 2.3022 - acc: 0.1115 - val_loss: nan - val_acc: 0.1139

I have modified several thing on the test in order to use one GPU.

Also, when I try to execute "autoqk = AutoQKeras(model, metrics=["acc"], custom_objects=custom_objects, **run_config)" I observe the following error:

Total energy: 3.30 uJ quantizing layers: ['conv2d_0', 'bn_0', 'act_0', 'drop_0', 'conv2d_1', 'bn_1', 'act_1', 'drop_1', 'conv2d_2', 'bn_2', 'act_2', 'drop_2', 'conv2d_3', 'bn_3', 'act_3', 'drop_3', 'conv2d_4', 'bn_4', 'act_4', 'drop_4', 'flatten', 'dense'] Limit configuration:{"Dense": [8, 8, 4], "Conv2D": [4, 8, 4], "DepthwiseConv2D": [4, 8, 4], "Activation": [4], "BatchNormalization": [], "^conv2d_0$": [["binary", "ternary", "quantized_bits(2,1,1,alpha=1.0)"], 8, 4], "^conv2d_[1234]$": [4, 8, 4], "^act_[0123]$": [4], "^act_4$": [8], "^dense$": [8, 8, 4]} 2022-11-03 13:28:03.656399: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:89 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device Traceback (most recent call last): File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/keras_tuner/engine/tuner.py", line 158, in _try_build model = self._build_hypermodel(hp) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/keras_tuner/engine/tuner.py", line 146, in _build_hypermodel model = self.hypermodel.build(hp) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 571, in build q_model, _ = self.quantize_model(hp) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/autoqkeras/autoqkeras_internal.py", line 557, in quantize_model q_model = model_quantize( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/utils.py", line 805, in model_quantize qmodel = quantized_model_from_json(json.dumps(jm), custom_objects) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/utils.py", line 894, in quantized_model_from_json qmodel = model_from_json(json_string, custom_objects=custom_objects) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/saving/model_config.py", line 131, in model_from_json return deserialize(config, custom_objects=custom_objects) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/serialization.py", line 173, in deserialize return generic_utils.deserialize_keras_object( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 354, in deserialize_keras_object return cls.from_config( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 668, in from_config input_tensors, output_tensors, created_layers = reconstruct_from_config( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1285, in reconstruct_from_config process_node(layer, node_data) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/functional.py", line 1233, in process_node output_tensors = layer(input_tensors, **kwargs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 951, in __call__ return self._functional_construction_call(inputs, args, kwargs, File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1090, in _functional_construction_call outputs = self._keras_tensor_symbolic_call( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call return self._infer_output_signature(inputs, args, kwargs, input_masks) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature self._maybe_build(inputs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2710, in _maybe_build self.build(input_shapes) # pylint:disable=not-callable File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/layers/convolutional.py", line 198, in build self.kernel = self.add_weight( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 623, in add_weight variable = self._add_variable_with_custom_getter( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/training/tracking/base.py", line 805, in _add_variable_with_custom_getter new_variable = getter( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 130, in make_variable return tf_variables.VariableV1( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 260, in __call__ return cls._variable_v1_call(*args, **kwargs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 206, in _variable_v1_call return previous_getter( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 199, in <lambda> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variable_scope.py", line 2604, in default_variable_creator return resource_variable_ops.ResourceVariable( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1574, in __init__ self._init_from_args( File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1712, in _init_from_args initial_value = initial_value() File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/qkeras/qlayers.py", line 105, in __call__ max_x = np.max(abs(x)) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs return gen_math_ops._abs(x, name=name) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/ops/gen_math_ops.py", line 46, in _abs _ops.raise_from_not_ok_status(e, name) File "/mnt/beegfs/gap/laumecha/miniconda3/envs/tf_yolo4/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs] 2022-11-03 13:28:03.943061: F tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:404] Check failed: kernel.Arity() == args.number_of_arguments() (0 vs. 10) /var/spool/slurm/d/job07739/slurm_script: line 5: 36902 Aborted (core dumped) python3 05_au* My code is the following:


#python 3
import sys
print(sys.version)

import warnings
warnings.filterwarnings("ignore")

import json
import pprint
import numpy as np
import six
import tempfile
import tensorflow.compat.v2 as tf
# V2 Behavior is necessary to use TF2 APIs before TF2 is default TF version internally.
tf.enable_v2_behavior()
from tensorflow.keras.optimizers import *

from qkeras.autoqkeras import *
from qkeras import *
from qkeras.utils import model_quantize
from qkeras.qtools import run_qtools
from qkeras.qtools import settings as qtools_settings

from tensorflow.keras.utils import to_categorical
import tensorflow_datasets as tfds

#AutoQKeras has some examples on how to run with mnist, fashion_mnist, cifar10 and cifar100.

print("using tensorflow", tf.__version__)

# GPU
enable_gpu=1

if enable_gpu:
  gpus = tf.config.list_physical_devices('GPU')
  if gpus:
    # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
    try:
      tf.config.set_logical_device_configuration(
          gpus[0],
          [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
      logical_gpus = tf.config.list_logical_devices('GPU')
      print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
      # Virtual devices must be set before GPUs have been initialized
      print(e)

    #print("GPUs: ", len(tf.config.experimental.list_physical_devices('GPU')))
    #os.environ['CUDA_VISIBLE_DEVICES']='0, 1, 2'
    #os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'
    #config = tf.compat.v1.ConfigProto()
    #session = tf.compat.v1.Session(config=config)
####################################################

def get_data(dataset_name, fast=False):
  """Returns dataset from tfds."""
  ds_train = tfds.load(name=dataset_name, split="train", batch_size=-1)
  ds_test = tfds.load(name=dataset_name, split="test", batch_size=-1)

  dataset = tfds.as_numpy(ds_train)
  x_train, y_train = dataset["image"].astype(np.float32), dataset["label"]

  dataset = tfds.as_numpy(ds_test)
  x_test, y_test = dataset["image"].astype(np.float32), dataset["label"]

  if len(x_train.shape) == 3:
    x_train = x_train.reshape(x_train.shape + (1,))
    x_test = x_test.reshape(x_test.shape + (1,))

  x_train /= 256.0
  x_test /= 256.0

  x_mean = np.mean(x_train, axis=0)

  x_train -= x_mean
  x_test -= x_mean

  nb_classes = np.max(y_train) + 1
  y_train = to_categorical(y_train, nb_classes)
  y_test = to_categorical(y_test, nb_classes)

  print(x_train.shape[0], "train samples")
  print(x_test.shape[0], "test samples")
  return (x_train, y_train), (x_test, y_test)

#MODEL
from tensorflow.keras.initializers import *
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import *

class ConvBlockNetwork(object):
  """Creates Convolutional block type of network."""

  def __init__(
      self,
      shape,
      nb_classes,
      kernel_size,
      filters,
      dropout_rate=0.0,
      with_maxpooling=True,
      with_batchnorm=True,
      kernel_initializer="he_normal",
      bias_initializer="zeros",
      use_separable=False,
      use_xnornet_trick=False,
      all_conv=False
  ):
    """Creates class.

    Args:
      shape: shape of inputs.
      nb_classes: number of output classes.
      kernel_size: kernel_size of network.
      filters: sizes of filters (if entry is a list, we create a block).
      dropout_rate: dropout rate if > 0.
      with_maxpooling: if true, use maxpooling.
      with_batchnorm: with BatchNormalization.
      kernel_initializer: kernel_initializer.
      bias_initializer: bias and beta initializer.
      use_separable: if "dsp", do conv's 1x3 + 3x1. If "mobilenet",
        use MobileNet separable convolution. If False or "none", perform single
        conv layer.
      use_xnornet_trick: use bn+act after max pool to enable binary
        to avoid saturation to largest value.
      all_conv: if true, implements all convolutional network.
    """
    self.shape = shape
    self.nb_classes = nb_classes
    self.kernel_size = kernel_size
    self.filters = filters
    self.dropout_rate = dropout_rate
    self.with_maxpooling = with_maxpooling
    self.with_batchnorm = with_batchnorm
    self.kernel_initializer = kernel_initializer
    self.bias_initializer = bias_initializer
    self.use_separable = use_separable
    self.use_xnornet_trick = use_xnornet_trick
    self.all_conv = all_conv

  def build(self):
    """Builds model."""
    x = x_in = Input(self.shape, name="input")
    for i in range(len(self.filters)):
      if len(self.filters) > 1:
        name_suffix_list = [str(i)]
      else:
        name_suffix_list = []
      if not isinstance(self.filters[i], list):
        filters = [self.filters[i]]
      else:
        filters = self.filters[i]
      for j in range(len(filters)):
        if len(filters) > 1:
          name_suffix = "_".join(name_suffix_list + [str(j)])
        else:
          name_suffix = "_".join(name_suffix_list)
        if self.use_separable == "dsp":
          kernels = [(1, self.kernel_size), (self.kernel_size, 1)]
        else:
          kernels = [(self.kernel_size, self.kernel_size)]
        for k, kernel in enumerate(kernels):
          strides = 1
          if (
              not self.with_maxpooling and j == len(filters)-1 and
              k == len(kernels)-1
          ):
            strides = 2
          if self.use_separable == "dsp":
            kernel_suffix = (
                "".join([str(k) for k in kernel]) + "_" + name_suffix)
          elif self.use_separable == "mobilenet":
            depth_suffix = (
                "".join([str(k) for k in kernel]) + "_" + name_suffix)
            kernel_suffix = "11_" + name_suffix
          else:
            kernel_suffix = name_suffix
          if self.use_separable == "mobilenet":
            x = DepthwiseConv2D(
                kernel,
                padding="same", strides=strides,
                use_bias=False,
                name="conv2d_dw_" + depth_suffix)(x)
            if self.with_batchnorm:
              x = BatchNormalization(name="conv2d_dw_bn_" + depth_suffix)(x)
            x = Activation("relu", name="conv2d_dw_act_" + depth_suffix)(x)
            kernel = (1, 1)
            strides = 1
          x = Conv2D(
              filters[j], kernel,
              strides=strides, use_bias=not self.with_batchnorm,
              padding="same",
              kernel_initializer=self.kernel_initializer,
              bias_initializer=self.bias_initializer,
              name="conv2d_" + kernel_suffix)(x)
          if not (
              self.with_maxpooling and self.use_xnornet_trick and
              j == len(filters)-1 and k == len(kernels)-1
          ):
            if self.with_batchnorm:
              x = BatchNormalization(
                  beta_initializer=self.bias_initializer,
                  name="bn_" + kernel_suffix)(x)
            x = Activation("relu", name="act_" + kernel_suffix)(x)
      if self.with_maxpooling:
        x = MaxPooling2D(2, 2, name="mp_" + name_suffix)(x)
        # this is a trick from xnornet to enable full binary or ternary
        # networks to be after maxpooling.
        if self.use_xnornet_trick:
          x = BatchNormalization(
              beta_initializer=self.bias_initializer,
              name="mp_bn_" + name_suffix)(x)
          x = Activation("relu", name="mp_act_" + name_suffix)(x)
      if self.dropout_rate > 0:
        x = Dropout(self.dropout_rate, name="drop_" + name_suffix)(x)

    if not self.all_conv:
      x = Flatten(name="flatten")(x)
      x = Dense(
          self.nb_classes,
          kernel_initializer=self.kernel_initializer,
          bias_initializer=self.bias_initializer,
          name="dense")(x)
      x = Activation("softmax", name="softmax")(x)
    else:
      x = Conv2D(
          self.nb_classes, 1, strides=1, padding="same",
          kernel_initializer=self.kernel_initializer,
          bias_initializer=self.bias_initializer,
          name="dense")(x)
      x = Activation("softmax", name="softmax")(x)
      x = Flatten(name="flatten")(x)

    model = Model(inputs=[x_in], outputs=[x])

    return model

def get_model(dataset):
  """Returns a model for the demo of AutoQKeras."""
  if dataset == "mnist":
    model = ConvBlockNetwork(
        shape=(28, 28, 1),
        nb_classes=10,
        kernel_size=3,
        filters=[16, 32, 48, 64, 128],
        dropout_rate=0.2,
        with_maxpooling=False,
        with_batchnorm=True,
        kernel_initializer="he_uniform",
        bias_initializer="zeros",
    ).build()

  elif dataset == "fashion_mnist":
    model = ConvBlockNetwork(
        shape=(28, 28, 1),
        nb_classes=10,
        kernel_size=3,
        filters=[16, [32]*3, [64]*3],
        dropout_rate=0.2,
        with_maxpooling=True,
        with_batchnorm=True,
        use_separable="mobilenet",
        kernel_initializer="he_uniform",
        bias_initializer="zeros",
        use_xnornet_trick=True
    ).build()

  elif dataset == "cifar10":
    model = ConvBlockNetwork(
        shape=(32, 32, 3),
        nb_classes=10,
        kernel_size=3,
        filters=[16, [32]*3, [64]*3, [128]*3],
        dropout_rate=0.2,
        with_maxpooling=True,
        with_batchnorm=True,
        use_separable="mobilenet",
        kernel_initializer="he_uniform",
        bias_initializer="zeros",
        use_xnornet_trick=True
    ).build()

  elif dataset == "cifar100":
    model = ConvBlockNetwork(
        shape=(32, 32, 3),
        nb_classes=100,
        kernel_size=3,
        filters=[16, [32]*3, [64]*3, [128]*3, [256]*3],
        dropout_rate=0.2,
        with_maxpooling=True,
        with_batchnorm=True,
        use_separable="mobilenet",
        kernel_initializer="he_uniform",
        bias_initializer="zeros",
        use_xnornet_trick=True
    ).build()

  model.summary()

  return model

DATASET = "mnist"
(x_train, y_train), (x_test, y_test) = get_data(DATASET)

model = get_model(DATASET)
custom_objects = {}

#with cur_strategy.scope():#changed
optimizer = Adam(lr=0.02)
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["acc"])
model.fit(x_train, y_train, epochs=100, batch_size=2048, steps_per_epoch=29, validation_data=(x_test, y_test))
model.save('exported_m/mnist_unqu.h5')

reference_internal = "fp32"
reference_accumulator = "fp32"

q = run_qtools.QTools(
    model,
    # energy calculation using a given process
    # "horowitz" refers to 45nm process published at
    # M. Horowitz, "1.1 Computing's energy problem (and what we can do about
    # it), "2014 IEEE International Solid-State Circuits Conference Digest of
    # Technical Papers (ISSCC), San Francisco, CA, 2014, pp. 10-14, 
    # doi: 10.1109/ISSCC.2014.6757323.
    process="horowitz",
    # quantizers for model input
    source_quantizers=[quantized_bits(8, 0, 1)],
    is_inference=False,
    # absolute path (including filename) of the model weights
    # in the future, we will attempt to optimize the power model
    # by using weight information, although it can be used to further
    # optimize QBatchNormalization.
    weights_path=None,
    # keras_quantizer to quantize weight/bias in un-quantized keras layers
    keras_quantizer=reference_internal,
    # keras_quantizer to quantize MAC in un-quantized keras layers
    keras_accumulator=reference_accumulator,
    # whether calculate baseline energy
    for_reference=True)

# caculate energy of the derived data type map.
energy_dict = q.pe(
    # whether to store parameters in dram, sram, or fixed
    weights_on_memory="sram",
    # store activations in dram or sram
    activations_on_memory="sram",
    # minimum sram size in number of bits. Let's assume a 16MB SRAM.
    min_sram_size=8*16*1024*1024,
    # whether load data from dram to sram (consider sram as a cache
    # for dram. If false, we will assume data will be already in SRAM
    rd_wr_on_io=False)

# get stats of energy distribution in each layer
energy_profile = q.extract_energy_profile(
    qtools_settings.cfg.include_energy, energy_dict)
# extract sum of energy of each layer according to the rule specified in
# qtools_settings.cfg.include_energy
total_energy = q.extract_energy_sum(
    qtools_settings.cfg.include_energy, energy_dict)

pprint.pprint(energy_profile)
print()
print("Total energy: {:.2f} uJ".format(total_energy / 1000000.0))  

quantization_config = {
        "kernel": {
                "binary": 1,
                "stochastic_binary": 1,
                "ternary": 2,
                "stochastic_ternary": 2,
                "quantized_bits(2,1,1,alpha=1.0)": 2,
                "quantized_bits(4,0,1,alpha=1.0)": 4,
                "quantized_bits(8,0,1,alpha=1.0)": 8,
                "quantized_po2(4,1)": 4
        },
        "bias": {
                "quantized_bits(4,0,1)": 4,
                "quantized_bits(8,3,1)": 8,
                "quantized_po2(4,8)": 4
        },
        "activation": {
                "binary": 1,
                "ternary": 2,
                "quantized_relu_po2(4,4)": 4,
                "quantized_relu(3,1)": 3,
                "quantized_relu(4,2)": 4,
                "quantized_relu(8,2)": 8,
                "quantized_relu(8,4)": 8,
                "quantized_relu(16,8)": 16
        },
        "linear": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(4,1)": 4,
                "quantized_bits(8,2)": 8,
                "quantized_bits(16,10)": 16
        }
}

limit = {
    "Dense": [8, 8, 4],
    "Conv2D": [4, 8, 4],
    "DepthwiseConv2D": [4, 8, 4],
    "Activation": [4],
    "BatchNormalization": [],

    "^conv2d_0$": [
                   ["binary", "ternary", "quantized_bits(2,1,1,alpha=1.0)"],
                   8, 4
    ],
    "^conv2d_[1234]$": [4, 8, 4],
    "^act_[0123]$": [4],
    "^act_4$": [8],
    "^dense$": [8, 8, 4]
}

goal = {
    "type": "energy",
    "params": {
        "delta_p": 8.0,
        "delta_n": 8.0,
        "rate": 2.0,
        "stress": 1.0,
        "process": "horowitz",
        "parameters_on_memory": ["sram", "sram"],
        "activations_on_memory": ["sram", "sram"],
        "rd_wr_on_io": [False, False],
        "min_sram_size": [0, 0],
        "source_quantizers": ["int8"],
        "reference_internal": "int8",
        "reference_accumulator": "int32"
        }
}

run_config = {
  "output_dir": tempfile.mkdtemp(),
  "goal": goal,
  "quantization_config": quantization_config,
  "learning_rate_optimizer": False,
  "transfer_weights": False,
  "mode": "random",
  "seed": 42,
  "limit": limit,
  "tune_filters": "layer",
  "tune_filters_exceptions": "^dense",
#  "distribution_strategy": cur_strategy,
  "layer_indexes": range(1, len(model.layers) - 1),
  "max_trials": 40
}

print("quantizing layers:", [model.layers[i].name for i in run_config["layer_indexes"]])

autoqk = AutoQKeras(model, metrics=["acc"], custom_objects=custom_objects, **run_config)
print("[INFO] AutoQKeras model created!")

autoqk.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=2048, epochs=20)
print("[INFO] AutoQKeras model trained!")

qmodel = autoqk.get_best_model()
qmodel.save_weights("mnist_qmodel_weights.h5")
qmodel.save("mnist_qmodel.h5")

qmodel.load_weights("mnist_qmodel_weights.h5")
#with cur_strategy.scope(): #changed
optimizer = Adam(lr=0.02)
qmodel.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["acc"])
qmodel.fit(x_train, y_train, epochs=200, batch_size=2048, validation_data=(x_test, y_test))

qmodel.save("mnist_qmodel.h5")

I have installed the dependencies using the requirements.txt files. Here is my conda environment: Name Version Build Channel _libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
_tflow_select 2.3.0 mkl
abseil-cpp 20211102.0 hd4dd3e8_0
absl-py 1.3.0 py37h06a4308_0
aiohttp 3.8.1 py37h7f8727e_1
aiosignal 1.2.0 pyhd3eb1b0_0
astunparse 1.6.3 py_0
async-timeout 4.0.2 py37h06a4308_0
asynctest 0.13.0 py_0
attrs 21.4.0 pyhd3eb1b0_0
blas 1.0 mkl
blinker 1.4 py37h06a4308_0
brotlipy 0.7.0 py37h27cfd23_1003
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.10.11 h06a4308_0
cachetools 4.2.2 pyhd3eb1b0_0
certifi 2022.9.24 py37h06a4308_0
cffi 1.15.1 py37h74dc2b5_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.0.4 py37h06a4308_0
cryptography 38.0.1 py37h9ce1e76_0
cudatoolkit 11.3.1 h2bc3f7f_2
cudnn 8.1.0.77 h90431f1_0 conda-forge dataclasses 0.8 pyh6d0b6a4_7
decorator 4.4.0 py37_1
flatbuffers 2.0.0 h2531618_0
frozenlist 1.2.0 py37h7f8727e_0
gast 0.5.3 pyhd3eb1b0_0
giflib 5.2.1 h7b6447c_0
google-auth 2.6.0 pyhd3eb1b0_0
google-auth-oauthlib 0.4.4 pyhd3eb1b0_0
google-pasta 0.2.0 pyhd3eb1b0_0
grpc-cpp 1.46.1 h33aed49_0
grpcio 1.42.0 py37hce63b2e_0
h5py 3.7.0 py37h737f45e_0
hdf5 1.10.6 h3ffc7dd_1
icu 58.2 he6710b0_3
idna 3.4 py37h06a4308_0
importlib-metadata 4.11.3 py37h06a4308_0
importlib_metadata 4.11.3 hd3eb1b0_0
iniconfig 1.1.1 pyhd3eb1b0_0
intel-openmp 2022.1.0 h9e868ea_3769
joblib 1.1.1 py37h06a4308_0
jpeg 9e h7f8727e_0
keras 2.9.0 py37h06a4308_0
keras-preprocessing 1.1.2 pyhd3eb1b0_0
krb5 1.19.2 hac12032_0
ld_impl_linux-64 2.38 h1181459_1
libblas 3.9.0 1_h86c2bf4_netlib conda-forge libcblas 3.9.0 5_h92ddd45_netlib conda-forge libcurl 7.85.0 h91b91d3_0
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h7f8727e_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
liblapack 3.9.0 5_h92ddd45_netlib conda-forge libnghttp2 1.46.0 hce63b2e_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.20.1 h4ff587b_0
libssh2 1.10.0 h8f2d780_0
libstdcxx-ng 11.2.0 h1234567_1
markdown 3.3.4 py37h06a4308_0
mkl 2020.2 256
mkl-service 2.3.0 py37he8ac12f_0
mkl_fft 1.2.0 py37h23d657b_0
mkl_random 1.1.1 py37h0573a6f_0
multidict 6.0.2 py37h5eee18b_0
ncurses 6.3 h5eee18b_3
networkx 2.5.1 pyhd8ed1ab_0 conda-forge numpy 1.21.6 py37h976b520_0 conda-forge oauthlib 3.2.1 py37h06a4308_0
openssl 1.1.1q h7f8727e_0
opt_einsum 3.3.0 pyhd3eb1b0_1
packaging 21.3 pyhd3eb1b0_0
pip 22.2.2 py37h06a4308_0
pluggy 1.0.0 py37h06a4308_1
prompt_toolkit 2.0.9 py37_0
protobuf 3.20.1 py37h295c915_0
py 1.11.0 pyhd3eb1b0_0
pyasn1 0.4.8 pyhd3eb1b0_0
pyasn1-modules 0.2.8 py_0
pycparser 2.21 pyhd3eb1b0_0
pygments 2.11.2 pyhd3eb1b0_0
pyjwt 2.4.0 py37h06a4308_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.9 py37h06a4308_0
pysocks 1.7.1 py37_1
pytest 7.1.2 py37h06a4308_0
python 3.7.13 haa1d7c7_1
python-flatbuffers 2.0 pyhd3eb1b0_0
python_abi 3.7 2_cp37m conda-forge pyyaml 6.0 py37h7f8727e_1
re2 2022.04.01 h295c915_0
readline 8.2 h5eee18b_0
requests 2.28.1 py37h06a4308_0
requests-oauthlib 1.3.0 py_0
rsa 4.7.2 pyhd3eb1b0_1
scikit-learn 1.0.2 py37h51133e4_1
scipy 1.7.3 py37hf2a6cf1_0 conda-forge setuptools 65.5.0 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_1
snappy 1.1.9 h295c915_0
sqlite 3.39.3 h5082296_0
tensorboard 2.9.0 py37h06a4308_0
tensorboard-data-server 0.6.0 py37hca6d32c_0
tensorboard-plugin-wit 1.8.1 py37h06a4308_0
tensorflow 2.9.1 mkl_py37h58a621a_0
tensorflow-base 2.9.1 mkl_py37h353358b_0
tensorflow-estimator 2.9.0 py37h06a4308_0
termcolor 1.1.0 py37h06a4308_1
threadpoolctl 2.2.0 pyh0d69192_0
tk 8.6.12 h1ccaba5_0
tomli 2.0.1 py37h06a4308_0
tqdm 4.64.1 py37h06a4308_0
typing-extensions 4.3.0 py37h06a4308_0
typing_extensions 4.3.0 py37h06a4308_0
urllib3 1.26.12 py37h06a4308_0
wcwidth 0.2.5 pyhd3eb1b0_0
werkzeug 2.0.3 pyhd3eb1b0_0
wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.14.1 py37h5eee18b_0
xz 5.2.6 h5eee18b_0
yaml 0.2.5 h7b6447c_0
yarl 1.8.1 py37h5eee18b_0
zipp 3.8.0 py37h06a4308_0
zlib 1.2.13 h5eee18b_0

laumecha commented 1 year ago

Solved. The problem was the version of CUDA.