mmaaz60 / EdgeNeXt

[CADL'22, ECCVW] Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications".
MIT License
350 stars 40 forks source link

Faster Inference INT8 #18

Closed yash-khurana closed 1 year ago

yash-khurana commented 1 year ago

@mmaaz60 Can you please provide the code for converting this model to int8? It is successfully converting to ONNX and I am able to infer using ONNXRuntime. However, is there any way to decrease the inference time further? Int8 or quantisation or pruning? I'm using edgenext_xx_small_bn_hs. Thanks a lot! Love your work!

mmaaz60 commented 1 year ago

Hi @yash-khurana,

Thank you for your interest in our work. In case your target hardware is GPU or Jetson devices, you may try to convert the model to TensorRT and quantize it to Int8 for optimized inference. For CPU inference, you may try exploring OpenVino from intel.

I have not tried above with EdgeNeXt but from my previous experience I believe that the above optimization can give you a reasonable speed-up. Do let everyone know if you will able to get any speedup.

Thank You

yash-khurana commented 1 year ago

Thank you for your response. Last I checked, neither PyTorch nor ONNX provide much support for int8 layers of transformers. I might be wrong though. I would be happy to explore them if you could point me in the direction of maybe a similar model being quantised to int8 or give me a starting point for it.

mmaaz60 commented 1 year ago

Thank You @yash-khurana,

You may use TensorRT command line trtexec to convert the ONNX into TRT engine with INT8 precision for inference on NVIDIA devices. Further have a look at the (Python samples)[https://github.com/NVIDIA/TensorRT/tree/main/samples/python/efficientnet] converting and inferring EfficientNet models to TensorRT. I hope this would be helpful.

Further, we are planning to release one of our incremental work on Efficient Models for Edge Devices in a couple of weeks and are planning to provide detailed instructions on inference on NVIDIA devices and IPhone. Stay Tuned! Thanks