Closed rishabh2301 closed 5 years ago
Same here. I noticed that each worker/process takes 185MB of gpu memory. So, using build_rawframes.py i increased the number of workers to 100 as I have 2 gpus each 12gigs. But, though a lot of gpu memory was left unused, the script failed complaining about opencv failing to allocate gpu memory. So, not sure what's going wrong or how the whole process can be expedited......
Hi @rishabh2301 @atique81
This process takes 200~300MB GPU memory for a 320x240 video normally. As of the running time, TV-L1 (with GPU implementation) is reported to have a real-time performance (30FPS) for 320x240 videos. In practical, the speed varies due to the video content (action recognition papers report 17FPS on average). The way to speed up the whole process is to increase the number of workers. But I haven't check what is the optimal number for each GPU. Another workaround is to use faster algorithm, such as Farneback (which is also included in denseflow, see https://github.com/yjxiong/dense_flow/blob/master/src/dense_flow_gpu.cpp#L31).
@zhaoyue-zephyrus thank you for your reply. I am working with 720 x 720 videos, maybe that is the reason for the slow computation, yes the GPU is taking around 350MB for one video. I will check the Farneback algorithm, if it increases the speed, thank you.
@zhaoyue-zephyrus I did see that the Farneback algorithm is much faster than tvl1. But, I want to use these flow frames to extract features from I3D model, for which they have originally used the tvl1 algorithm for flow. As the results of different optical flow algorithms are not quite same, I am wondering if its a good idea not to use tvl1.
@rishabh2301
Thanks @zhaoyue-zephyrus for the reply. I am using it for THUMOS14 dataset for which the test video files have a resolution of 320x180. For me, each process took 185MB of gpu memory. I used 40 workers (as using more might break as I mentioned above) which took about 2 days to process the 210 test videos in THUMOS14 detection task.
@atique81 I think this sounds reasonable.
@zhaoyue-zephyrus Ok, great!
Closing this. Please feel free to reopen it if you meet any further problem.
Hi, I have build opencv 4.1.0 with cuda 9.2 and I have build the latest available dense_flow repository. I am extracting flow frames for my own video dataset using the build_rawframes.py script. However it is taking a lot of time even on GPU for extracting the flow frames. I have increased the number of workers for multiprocessing so that the GPU memory can be fully utilized. But I want to extract the frames for 4000 videos and with this speed it may take a long time, is it the usual speed ? Please let me know , thank you very much for the code.