nihui / rife-ncnn-vulkan

RIFE, Real-Time Intermediate Flow Estimation for Video Frame Interpolation implemented with ncnn library
MIT License
791 stars 68 forks source link

Performance - Vulkan vs. CUDA #22

Open chainikdn opened 3 years ago

chainikdn commented 3 years ago

So we have RIFE over Pytorch with CUDA backend and RIFE over ncnn with Vulkan backend. We can directly compare their performance with "Flowframes". What I can see on my rig (RTX 2060) - RIFE/Pytorch/CUDA is 3 (three) times faster than RIFE/ncnn/Vulkan. RIFE/CUDA uses up to 50% of my GPU (as shown by the Task Manager), while RIFE/Vulkan is not going higher than 10-15% regardless of the threads count.

Is there anything we could do about it? Will implementing CUDA backend in ncnn (see https://github.com/atanmarko/ncnn-with-cuda) help?

dawei03896 commented 3 years ago

Hi friend, Which version of the rife model did you use to test the time?

chainikdn commented 3 years ago

2.3, 2.4, 3.1... no difference at all 10% GPU load with ncnn/Vulkan and 50% GPU load with Pytorch/CUDA

I wonder where is the bottleneck?

Mar2ck commented 3 years ago

Btw don't look at the 3d graph in task manager since it won't show accurate usage for either Cuda or Vulkan-Compute. Look at Cuda, Compute_0 and Compute_1 graphs instead. For me rife always uses 100% in Compute_1

chainikdn commented 3 years ago

I'm talking about overall "GPU activity", dunno what exactly it means. Maybe it's completely not relevant to the perf. difference. Anyway it's 5 fps vs. 15 fps, (for 1080p source)

n00mkrad commented 3 years ago

CUDA is very optimized and Nvidia spent millions in R&D on it.

I don't think the Vulkan implementation can magically catch up to CUDA.

chainikdn commented 3 years ago

"ncnn with CUDA" mentioned above is only missing one layer (Interp) for running RIFE, as far as I can see if it will immediately get "magical" x3 perf. improvement then... it's a good thing, right?

KeygenLLC commented 2 years ago

On macOS 10.14.6 with AMD Radeon Pro Vega 64, my GPU is pegged when using rife-ncnn-vulkan. 100% used. Almost all CPU cores fire up at the beginning at ~30-50% each (10 core machine), then CPU usage goes down to nothing and the GPU takes over.

TTA breaks it though, which I posted an issue about, but I'm getting full use of the GPU here.

Here's the GPU usage shown from Activity Monitor. Should be easy to see where the render started:

Screen Shot 2021-12-03 at 7 50 13 PM