Inference speed/ memory optimisation

hkchengrex / Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

https://hkchengrex.com/Cutie/

MIT License

732 stars 71 forks source link

Inference speed/ memory optimisation #20

Closed mzkaramat closed 1 year ago

mzkaramat commented 1 year ago

Hi @hkchengrex , great work! May I ask, is it possible to optimise the inference in term of running at half precision fp16 by any chance, is the amp flag in config meant for it? And if you can give some pointers, if I want to quantise the model to maybe int8 (if that might be possible)? My motivation is to run the model at lower memory and using less GPU computation.

hkchengrex commented 1 year ago

Hi, "amp" is for mixed precision which is similar to but might be slightly more expensive than pure fp16. PyTorch also has a quantization API: https://pytorch.org/docs/stable/quantization.html

mzkaramat commented 1 year ago

thanks, it was more of a question then issue. Closing it,