tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.48k stars 320 forks source link

some questions about quantization in TensorFlow #1064

Open rthenamvar opened 1 year ago

rthenamvar commented 1 year ago

I've read through the official guide and ran into problems understanding some concepts:

  1. Is it possible to use Quantization Aware Training and not convert the model to a TF Lite model at the end?
  2. Can I change the framework's default of 8-bit quantization? In the official document 4-bit and 16-bit quantizations were mentioned as experimental meaning the models cannot be converted to TF lite models. But isn't it possible to use the models without converting them to TF Lite models?

Thanks