Open sberryman opened 6 years ago
What kind of gpu/cpu setup are you using? The video branch is still in the works, it was a test to grab process video frame by frame and render it in real time.
Output for op_point=2
and op_point=3
https://gist.github.com/sberryman/b613ba3146878f12fc56c5876c194e40
The only change I made was to output an image vs the .flo file. That shouldn't impact any of the timing either based on what I saw in the code.
On a side node I had to remove #include <arm_neon.h>
in refine_variational.cpp
and FDF1.0.1/image.cpp
I also had to comment out lines in CMakeLists.txt for eigen3 to locate the correct include directory and switch to VECTOR_WIDTH=1
in order to get it to compile.
Duration: ~140ms
Duration: ~434ms
Duration: ~899ms
Duration: ~3306 ms
Duration: ~121ms
op_point=1 takes ~1ms and is pretty much empty and op_point=4 results in a cuda error CUDA error at /root/FlowOnTheGo/src/kernels/flowUtil.cu:533 code=77(cudaErrorIllegalAddress) "cudaHostGetDevicePointer(&a11c1, a11->c1, 0)"
FYI, this was all done using the master branch. After looking through optimize_refine
I see you have quite a few optimizations there.
I've been trying to run the flow on some 4K video and I'm not getting anywhere near the performance you reported in the paper.
Oddly enough using the reference implementation and changing
DISOpticalFlow::PRESET_ULTRAFAST
toDISOpticalFlow::PRESET_FAST
was producing flow at roughly 450-480ms per frame. Using preset3
I'm getting great flow results but it is statingTIME (O.Flow Run-Time ) (ms): 3293.45
. When using the default2
it runs very quickly atTIME (O.Flow Run-Time ) (ms): 122.223
I also see you started a video branch, have you implemented that and not pushed to github by any chance?