Graph Mode QAT / Static Quantization

huggingface / nn_pruning

Prune a model while finetuning or training.

Apache License 2.0

394 stars 58 forks source link

Closed michaelbenayoun closed 3 years ago

michaelbenayoun commented 3 years ago

This PR adds graph mode QAT and static quantization using the experimental symbolic tracing feature from the transformers library . This will allow more flexible model editing to apply the changes needed to target some device / inference engine.

For now, it is not integrated to the rest of the library (will come in later PRs), and export to TorchScript is supported.

madlag commented 3 years ago

Nice work ! Can't wait for the next steps !