Closed anth2o closed 4 years ago
Actually, the GPU is almost not used when doing inference on a video. The main bottleneck of inference is opening the images and/or resizing them before sending them to the object_detection model
Performances with multi processing for 343 frames and 1 GPU:
When doing inference on a video, we need to make object detection inference on each frames we splitted. For a video of 1 minute and 4 fps of inference frames, it means 460=240 pictures to make inference on. The average inference speed for the FPN is 120ms / img, which means the inference for a single video is 240 0.12 = 30 seconds. There is also the time to open pictures on disk.
I imagine 2 solutions to improve inference time:
do not store pictures on disk when splitting the video, and directly do the inference for each frames
multi process the inference