Closed waghmaregovind closed 4 years ago
Hi, sorry for unclear description about the table. As you said, the running times of RootNet and PoseNet are calculated for each bounding box.
However, the running time of them are not exactly proportional to the number of boxes in the image because of parallel processing. One can make mini-batch with all bounding boxes of an image and fed the mini-batch to the models. This is also applicable in real-world scenario because all boxes that is detected from the DetectNet can be fed to the PoseNet and RootNet in one time, unless there are too many people in the image, which results in GPU OOM.
I checked that in case of the PoseNet, processing 8 boxes in parallel takes 3 times more time than that of processing 1 box In case of RootNet, processing 8 boxes in parallel takes 2 times more time than that of processing 1 box.
Hope this can clarify your question.
Your response does clarify the issue. As you pointed out, example 2 given by me does not consider parallel processing and should be considered with respect to batch. Thanks for the prompt response and additional run-time evaluations.
Greetings, In supplementary material of the paper, Table 8 shows running time for each component. My question is specifically about inference time of RootNet and PoseNet. As RootNet and PoseNet work on bounding boxes, is frame same as that of bounding box in this context? I think the numbers reported are with respect to per bounding box and not per image as single image can contain multiple people.
I'm providing two examples to clarify it further.
Thanks for releasing the source.