mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.74k stars 2.93k forks source link

Proposal: Include DAIN for frame interpolation #7497

Closed flxai closed 4 years ago

flxai commented 4 years ago

The wiki lists different interpolation techniques. There are examples on the web that use classic methods like SVP motion interpolation within mpv.

Not so long ago, scientists developed a new technique for frame interpolation, called DAIN. Its project website describes it best, giving lots of example footage. At the core of the method is a neural network with the following schematic architecture:

dain-architecture

Because of the stunning results (PSNR/SSIM) I'd like to see DAIN included so it can be used from within mpv. The authors have provided the source code as well as the trained weights for the network in a dedicated repository, which could be used as a base for a Python plugin.

mia-0 commented 4 years ago

Too slow. And their implementation requires PyTorch and CUDA, which is just … no.

flxai commented 4 years ago

Too slow.

Do you refer to their implementation or the architecture in general?

And their implementation requires PyTorch and CUDA

As I understand it their model could be converted to either NNEF or ONNX format to be used without PyTorch or CUDA. Unfortunately I'm not yet aware of any specific (e.g. Vulkan) implementations that would allow hardware accelerated inference.

haasn commented 4 years ago

Correct me if I'm wrong but from skimming their paper it seems like they're getting >100ms runtimes using an NVIDIA TITAN X on 480p imagesets. That's not realtime, ergo we can't use it in mpv. Re-open in 5 years.

Unless you want to try trimming the network down to a smaller size to make it realtime viable (like was done with FSRCNN). But I am not a NN researcher.

Argon- commented 4 years ago

To give a little bit of perspective to "realtime": at 24fps you have around 41ms per frame. This includes all that needs to be done, color stuff, scaling, potentially filters. For 60fps our time budget is around 16ms. Now, given that most people don't have a Titan X and also usually don't watch 480p videos, their interpolation implementation doesn't need just a minor speed bump. To have it run realtime on 1080p and beyond, with more common GPUs, this probably needs to be a whole order of magnitude faster than it currently is.