open-mmlab / mmaction

An open-source toolbox for action understanding based on PyTorch
https://open-mmlab.github.io/
Apache License 2.0
1.86k stars 352 forks source link

Time taken by denseflow on GPU #38

Closed rishabh2301 closed 5 years ago

rishabh2301 commented 5 years ago

Hi, I have build opencv 4.1.0 with cuda 9.2 and I have build the latest available dense_flow repository. I am extracting flow frames for my own video dataset using the build_rawframes.py script. However it is taking a lot of time even on GPU for extracting the flow frames. I have increased the number of workers for multiprocessing so that the GPU memory can be fully utilized. But I want to extract the frames for 4000 videos and with this speed it may take a long time, is it the usual speed ? Please let me know , thank you very much for the code.

rahman-mdatiqur commented 5 years ago

Same here. I noticed that each worker/process takes 185MB of gpu memory. So, using build_rawframes.py i increased the number of workers to 100 as I have 2 gpus each 12gigs. But, though a lot of gpu memory was left unused, the script failed complaining about opencv failing to allocate gpu memory. So, not sure what's going wrong or how the whole process can be expedited......

zhaoyue-zephyrus commented 5 years ago

Hi @rishabh2301 @atique81

This process takes 200~300MB GPU memory for a 320x240 video normally. As of the running time, TV-L1 (with GPU implementation) is reported to have a real-time performance (30FPS) for 320x240 videos. In practical, the speed varies due to the video content (action recognition papers report 17FPS on average). The way to speed up the whole process is to increase the number of workers. But I haven't check what is the optimal number for each GPU. Another workaround is to use faster algorithm, such as Farneback (which is also included in denseflow, see https://github.com/yjxiong/dense_flow/blob/master/src/dense_flow_gpu.cpp#L31).

rishabh2301 commented 5 years ago

@zhaoyue-zephyrus thank you for your reply. I am working with 720 x 720 videos, maybe that is the reason for the slow computation, yes the GPU is taking around 350MB for one video. I will check the Farneback algorithm, if it increases the speed, thank you.

rishabh2301 commented 5 years ago

@zhaoyue-zephyrus I did see that the Farneback algorithm is much faster than tvl1. But, I want to use these flow frames to extract features from I3D model, for which they have originally used the tvl1 algorithm for flow. As the results of different optical flow algorithms are not quite same, I am wondering if its a good idea not to use tvl1.

zhaoyue-zephyrus commented 5 years ago

@rishabh2301

  1. As of 720x720 videos, the speed is expected to be slow. (See Table 1 in https://pequan.lip6.fr/~bereziat/cours/master/vision/papers/zach07.pdf)
  2. If you want these flow frames to extract features from I3D model, then I would recommend sticking to the tv-l1 algorithm. Different flow methods do have some difference in the output.
rahman-mdatiqur commented 5 years ago

Thanks @zhaoyue-zephyrus for the reply. I am using it for THUMOS14 dataset for which the test video files have a resolution of 320x180. For me, each process took 185MB of gpu memory. I used 40 workers (as using more might break as I mentioned above) which took about 2 days to process the 210 test videos in THUMOS14 detection task.

zhaoyue-zephyrus commented 5 years ago

@atique81 I think this sounds reasonable.

rahman-mdatiqur commented 5 years ago

@zhaoyue-zephyrus Ok, great!

yjxiong commented 5 years ago

Closing this. Please feel free to reopen it if you meet any further problem.