Only Qconv layer's output tensors are quantized

Hello,

I am using a quantized QKeras model, where all the Conv, BatchNormalization, and Dense parameters have been quantized to 4 bits.

However, when I run the predict function of one image and then print the output tensors of the quantized layers, I can see that only the Qconv layer's output tensors are expressed in 4 bits. In contrast, the outputs tensors of the QBatchNormalization and the QDense are expressed in regular floating point.

My question is: If I use a QKeras quantized model, does QKeras perform the quantization of the input tensors or output tensor of the quantized layers in the prediction function internally? Why is only the QConv layer's output expressed in 4 bits?

## Loading model
model = qkeras_utils.load_qmodel(model_dir)
model.summary()

(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Converting the pixels data to float type
train_images = train_images.astype('float32')
test_images = test_images.astype('float32')

# Standardizing (255 is the total number of pixels an image can have)
train_images = train_images / 255
test_images = test_images / 255 

num_classes = 10
train_labels = to_categorical(train_labels, num_classes)
test_labels = to_categorical(test_labels, num_classes)

iterations = 1
for i in range(iterations):
    print("Iteration ", i)
    image = test_images[i].reshape(-1, 32, 32, 3)
    #predictions = model.predict(image)
    get_all_layer_outputs = K.function([model.layers[0].input],
                                      [l.output for l in model.layers[0:]])

    layer_output = get_all_layer_outputs([image]) # return the same thing
    m = 0
    for j in layer_output:
        print(model.layers[m].__class__.__name__)
        print(j)
        m = m+1

And my output:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (QConv2D)             (None, 32, 32, 32)        896       
_________________________________________________________________
batch_normalization (QBatchN (None, 32, 32, 32)        128       
_________________________________________________________________
conv2d_1 (QConv2D)           (None, 32, 32, 32)        9248      
_________________________________________________________________
batch_normalization_1 (QBatc (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_2 (QConv2D)           (None, 16, 16, 64)        18496     
_________________________________________________________________
batch_normalization_2 (QBatc (None, 16, 16, 64)        256       
_________________________________________________________________
conv2d_3 (QConv2D)           (None, 16, 16, 64)        36928     
_________________________________________________________________
batch_normalization_3 (QBatc (None, 16, 16, 64)        256       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_4 (QConv2D)           (None, 8, 8, 128)         73856     
_________________________________________________________________
batch_normalization_4 (QBatc (None, 8, 8, 128)         512       
_________________________________________________________________
conv2d_5 (QConv2D)           (None, 8, 8, 128)         147584    
_________________________________________________________________
batch_normalization_5 (QBatc (None, 8, 8, 128)         512       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 128)         0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 4, 4, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 2048)              0         
_________________________________________________________________
dense (QDense)               (None, 128)               262272    
_________________________________________________________________
batch_normalization_6 (QBatc (None, 128)               512       
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (QDense)             (None, 10)                1290      
=================================================================
...

QConv2D
[[[[0.     0.     0.25   ... 0.     0.375  0.    ]
   [0.     0.     0.     ... 0.     0.6875 0.25  ]
   [0.     0.     0.     ... 0.     0.6875 0.1875]

...

QBatchNormalization
[[[[ 0.02544868  0.16547686  1.791272   ... -0.0244638   0.58454317
    -0.66077614]
   [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
     1.0357761 ]
   [ 0.02544868  0.16547686  0.0947198  ... -0.0244638   1.4546151
     0.61163807]
...

QConv2D
[[[[0.     0.9375 0.     ... 0.     0.     0.9375]
   [0.     0.     0.     ... 0.375  0.     0.    ]
   [0.     0.     0.     ... 0.0625 0.     0.    ]
   ...

google / qkeras

Only Qconv layer's output tensors are quantized #106