Performance - Githubissues

zhaorz / FlowOnTheGo

Fast, accurate optical flows on mobile GPUs.

30 stars 7 forks source link

Performance #16

Open sberryman opened 6 years ago

sberryman commented 6 years ago

I've been trying to run the flow on some 4K video and I'm not getting anywhere near the performance you reported in the paper.

Oddly enough using the reference implementation and changing DISOpticalFlow::PRESET_ULTRAFAST to DISOpticalFlow::PRESET_FAST was producing flow at roughly 450-480ms per frame. Using preset 3 I'm getting great flow results but it is stating TIME (O.Flow Run-Time ) (ms): 3293.45. When using the default 2 it runs very quickly at TIME (O.Flow Run-Time ) (ms): 122.223

I also see you started a video branch, have you implemented that and not pushed to github by any chance?

AshwinSekar commented 6 years ago

What kind of gpu/cpu setup are you using? The video branch is still in the works, it was a test to grab process video frame by frame and render it in real time.

sberryman commented 6 years ago

Output for op_point=2 and op_point=3 https://gist.github.com/sberryman/b613ba3146878f12fc56c5876c194e40

The only change I made was to output an image vs the .flo file. That shouldn't impact any of the timing either based on what I saw in the code.

sberryman commented 6 years ago

On a side node I had to remove #include <arm_neon.h> in refine_variational.cpp and FDF1.0.1/image.cpp

I also had to comment out lines in CMakeLists.txt for eigen3 to locate the correct include directory and switch to VECTOR_WIDTH=1 in order to get it to compile.

sberryman commented 6 years ago

Reference

PRESET_ULTRAFAST

Duration: ~140ms flow_ultrafast_140ms

PRESET_FAST

Duration: ~434ms flow_fast_434ms

PRESET_MEDIUM

Duration: ~899ms flow_medium_899ms

FlowOnTheGo

op_point - 3

Duration: ~3306 ms flow_oppoint_3

op_point - 2

Duration: ~121ms flow_oppoint_2

op_point=1 takes ~1ms and is pretty much empty and op_point=4 results in a cuda error CUDA error at /root/FlowOnTheGo/src/kernels/flowUtil.cu:533 code=77(cudaErrorIllegalAddress) "cudaHostGetDevicePointer(&a11c1, a11->c1, 0)"

sberryman commented 6 years ago

FYI, this was all done using the master branch. After looking through optimize_refine I see you have quite a few optimizations there.