[models] Add model compression utils

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

https://mindee.github.io/doctr/

Apache License 2.0

3.87k stars 445 forks source link

[models] Add model compression utils #5

Open fg-mindee opened 3 years ago

fg-mindee commented 3 years ago

Add a doctr.models.utils module to compress existing models and improve their latency / memory load for inference purposes on CPU. Some interesting leads to investigate:

[x] FP conversion (#10)
[x] Quantization (#10)
[ ] Pruning (cf. https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide)
[x] TF Lite export (#10)
[x] ONNX export (cf. https://github.com/onnx/keras-onnx & https://github.com/onnx/tensorflow-onnx)
[x] Export to SaveModel (#246)

Optional: TensorRT export (cf. https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/)

fg-mindee commented 3 years ago

ONNX conversion seems to be incompatible with TF 2.4.* as per https://github.com/onnx/keras-onnx/issues/662. I tried on my end and encountered the same problem. Moving this to the next release until this gets fixed!

fg-mindee commented 3 years ago

A good lead for ONNX support would be to use https://github.com/onnx/tensorflow-onnx (might have to create a savemodel to use it but it's worth a look)

felixdittrich92 commented 2 years ago

@frgfm i think we can remove the tensorrt point If we support onnx wdyt ?

frgfm commented 2 years ago

Yes sure! We'll need to take a look at pruning at some point

felixdittrich92 commented 2 years ago

yeah pruning is fine but tensorrt is a bit to much (should do the user on his own side if we can provide onnx this should be not so tricky)