scicafe / scicafe.github.io

scicafe blog
https://sci.cafe
GNU General Public License v3.0
1 stars 0 forks source link

Quantization 1 - Review #12

Open suriyadeepan opened 4 years ago

suriyadeepan commented 4 years ago

Review of Guide to Quantization and Quantization Aware Training using the TensorFlow Model Optimization Toolkit

TF's model optimization Toolkit (TFMOT) contains tools that you can use to quantize and prune your model for faster inference in edge devices.

Quantization is where we convert the weights of our model from high precision Floats, to low precision INT8 weights.

In weight quantization, we only quantize the weights and then upconvert the saved weights during inference.

post training quantization

  • Caps

On the other hand, quantization aware training (QAT), emulates quantized weights during the training process.

Instead, we will use the nightly version of TensorFlow (issue)

We will use the MobileNetV2 model in this example, so we need to import that.

We will use a Global Average Pooling layer after the MobileNet output. This will be followed by two fully connected layers and an output layer with 9 neurons and a softmax activation function.

More specifically, each layer in the model, is changed to their quantization aware equivalent operation.

Note: We have not tested these reasons and there could be other causes.

soham96 commented 4 years ago

Issues needing to be fixed:

Note: We have not tested these reasons and there could be other causes. Shouldn't we check the literature for answers?

Will search for resources and push update when I find some. I don't think this is release blocking? @suriyadeepan @varchanaiyer