zzh-tech / BiT

[CVPR2023] Blur Interpolation Transformer for Real-World Motion from Blur
https://zzh-tech.github.io/BiT/
MIT License
222 stars 8 forks source link

Request for a practical Pre-BiT++ model trained on perceptual loss #5

Open AIVFI opened 1 year ago

AIVFI commented 1 year ago

I'm a big fan of the practical use of video frame interpolation AI models to make watching movies, TV series and other video content as close to real life as possible. I also believe that for an even better representation of real life, the next step towards even better realism is to use joint video deblurring and frame interpolation AI models, due to the fact that almost all footage recorded at 24fps contains around 20.8ms of motion blur in each frame.

I am hugely grateful to you for the RBI dataset with real motion blur, as this will finally make it possible to develop models that will perform well with real video footage. Thank you also for the information about the Pre-BiT++ model, which is trained on Adobe240 and then on RBI in order to get even better results.

Only a practical Pre-BiT++ model trained on perceptual loss is missing to make it perfect. Why is it so important to train on perceptual loss to use models in practice? I described it in detail in the introduction to the rankings here: https://github.com/AIVFI/Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings

In short: training on perceptual loss recovers more fine details, which is more pleasing to the human eye. This is particularly important for models such as BiT, where all video frames will be replaced by new frames, unlike video frame interpolation models where the original frames are preserved. In addition, BiT, by removing motion blur and giving clear and sharp output frames, will further benefit from the ability to recovers fine details through training on perceptual loss.

So here is my big request to you to train a practical Pre-BiT++ model on perceptual loss. Unfortunately I am not a programmer myself and have no knowledge or skills in this area. These rankings of mine above are the pinnacle of my abilities and a way to connect with those who do model development on a daily basis. In this way, I want to help enthusiasts like me to find the best model for practical applications. I believe that a practical Pre-BiT++ trained on perceptual loss may be the best model for practical use, hence my request.

I also think that such a model would also attract even more attention to your repository, which is also important to me, as I want to see more models trained on RBI dataset with real motion blur in the future.

At the moment, of the 3 most popular frame interpolation methods on GitHub https://github.com/search?o=desc&q=Frame+Interpolation&s=stars&type=Repositories :

7.9k stars - DAIN (CVPR 2019) 3.4k stars - RIFE (ECCV2022) 2.1k stars - FILM (ECCV 2022)

developers of as many as two of those: RIFE and FILM have provided additional practical models that, although they do not reach as high PSNR and SSIM as the primary models of these methods, offer much better perceptual quality.

Thus, I believe that a practical Pre-BiT++ model trained on perceptual loss can gain very wide interest not only from researchers but also from a wide range of enthusiasts for restoring realism to movies, TV series and other video footage.

zzh-tech commented 1 year ago

Thank you for your enthusiasm! I am currently working on ways to improve the theoretical upper bound and practical performance of the video frame interpolation algorithms, which include RIFE and two more current CVPR2023 models (AMT and EMA-VFI). With limited time and resources, I have to finish the important work at hand first, and I think you'll like this one.