Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
When running mixed precision quantization for weights/activation compression only but there are layers with multiple activation bit-width candidates, we need to filter out the irrelevant activation candidates (to make the layers non-configurable).
The same goes for activation only mixed precision.
For this, we added a new procedure before the mixed precision search which modified the candidates of the relevant nodes in the graph.
Checklist before requesting a review:
[x] I set the appropriate labels on the pull request.
[x] I have added/updated the release note draft (if necessary).
[x] I have updated the documentation to reflect my changes (if necessary).
[x] All function and files are well documented.
[x] All function and classes have type hints.
[x] There is a licenses in all file.
[x] The function and variable names are informative.
Pull Request Description:
When running mixed precision quantization for weights/activation compression only but there are layers with multiple activation bit-width candidates, we need to filter out the irrelevant activation candidates (to make the layers non-configurable). The same goes for activation only mixed precision.
For this, we added a new procedure before the mixed precision search which modified the candidates of the relevant nodes in the graph.
Checklist before requesting a review: