hzwer / ECCV2022-RIFE

ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
MIT License
4.47k stars 444 forks source link

Training with mixed-precision #128

Open VvvvvGH opened 3 years ago

VvvvvGH commented 3 years ago

Will you consider train with mixed-precision? It can speedup inference and lower vram usage.

hzwer commented 3 years ago

I currently have many other experiments to improve visual effects. Improving efficiency is currently not a high priority work. Is there any data to illustrate the speedup of RIFE at mixed precision?

VvvvvGH commented 3 years ago

GPU:RTX3090 CPU: 9900K

Ran Vimeo90K.py on Vimeo test set. Use Model version 2.4 Simply modify code to use half precision.

        self.flownet = self.flownet.half()
        self.contextnet = self.contextnet.half()
        self.fusionnet = self.fusionnet.half()

Original:

Avg PSNR: 34.08642482478815 SSIM: 0.971693217754364 Inference time: 0.015075199002629547
Total inference time: 57.014402627944946

Using half precision:

Avg PSNR: 34.01964114699109 SSIM: 0.9714009165763855 Inference time: 0.013368013735992577
Total inference time: 50.557827949523926

On 1080p video VRAM usage reduce from 6347MB to 3450MB

Original:

900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:54<00:00,  7.84it/s]

Using half precision:

900.0 frames in total, 15.0FPS to 60.0FPS
The audio will be merged after interpolation process
100%|█████████████████████████████████████████████████████████████████████████████▉| 899/900.0 [01:41<00:00,  8.83it/s]
hzwer commented 3 years ago

OK. At present, it seems that there is indeed a need to reduce the memory overhead, and I can design some new models.

a1600012888 commented 3 years ago

Hi, thanks for testing the fp16 speed and memory consumption. I believe we may see a more significant speedup effect when using GPUs that support FP16 much better, like T4, V100 For GPUs like 1080 Ti and 3090, the theoretical performance for half precision did improve a lot compared with full precision. 3090: https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622 T4: https://www.techpowerup.com/gpu-specs/tesla-t4.c3316