Open VvvvvGH opened 3 years ago
I currently have many other experiments to improve visual effects. Improving efficiency is currently not a high priority work. Is there any data to illustrate the speedup of RIFE at mixed precision?
GPU:RTX3090 CPU: 9900K
Ran Vimeo90K.py on Vimeo test set. Use Model version 2.4 Simply modify code to use half precision.
self.flownet = self.flownet.half()
self.contextnet = self.contextnet.half()
self.fusionnet = self.fusionnet.half()
Original:
Avg PSNR: 34.08642482478815 SSIM: 0.971693217754364 Inference time: 0.015075199002629547
Total inference time: 57.014402627944946
Using half precision:
Avg PSNR: 34.01964114699109 SSIM: 0.9714009165763855 Inference time: 0.013368013735992577
Total inference time: 50.557827949523926
On 1080p video VRAM usage reduce from 6347MB to 3450MB
Original:
900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:54<00:00, 7.84it/s]
Using half precision:
900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:41<00:00, 8.83it/s]
OK. At present, it seems that there is indeed a need to reduce the memory overhead, and I can design some new models.
Hi, thanks for testing the fp16 speed and memory consumption. I believe we may see a more significant speedup effect when using GPUs that support FP16 much better, like T4, V100 For GPUs like 1080 Ti and 3090, the theoretical performance for half precision did improve a lot compared with full precision. 3090: https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622 T4: https://www.techpowerup.com/gpu-specs/tesla-t4.c3316
Will you consider train with mixed-precision? It can speedup inference and lower vram usage.