Compare performance of reading specific sampled frames instead of reading full video

yondonfu commented 4 years ago

At the moment, the verifier iterates through each frame of a video and creates a randomly sampled list of frames as it goes through each frame of the video.

We might be able to reduce the computational overhead for the verifier by first creating a randomly generated list of frame positions (or presentation timestamps) and then seeking through the video and only decoding/fetching the frames at the relevant positions/timestamps. Seems that this is possible using OpenCV.

We can do 2 benchmarks:

Calculate the time required for frame sampling when using the current approach with 2 second video segments. Run N trials.
Calculate the time required for frame sampling with the new suggested approach with 2 video segments. Run N trials.

Then, we can compare the results of the two approaches [1].

[1] @ndujar has already tried out the new suggested approach and has observed faster computation with longer video assets. The purpose of doing these 2 benchmarks is to get some #s on how much faster the new suggested approach can be.

j0sh commented 4 years ago

My guess is that the gains will be negligible, and may even make things worse for short segments (eg, ours are typically 2s). Seeking requires decoding from the last keyframe, and short segments may only have a single keyframe. Moreover, repeated seeking may require repeated decoding of the same content, which would slow things down compared to a single pass.

yondonfu commented 4 years ago

@j0sh Ah good point about the need to decode from the last keyframe and that short segments might only have a single keyframe.

@ndujar was experimenting with this using larger video assets and I'm guessing any speed gains with larger video assets stemmed from the existence of multiple keyframes.

I'll keep this issue open for now and just move it to the ice box.

Sorkanius commented 4 years ago

I did some tests on this at the start and if I remember correctly, seeking worsened the results, they were done with, what I think, were segments of 10 seconds.

Instead of random sampling the whole video, we could pick a number N (small: 1,2, 3?) where we do seeking and get the next M frames. Then perform verification as usual. We can then test if the results are statistically similar.

ndujar commented 4 years ago

Experiments were conducted using the code available in testing/video_parsing.py. Results:

Frames #	Duration (s)	Seek time (ms)	Iteration time (ms)
19030	634	23500	260740
300	10	501	5294
50	2	1040	482
35	1.5	521	322

The conclusion is that only for long segments it might be worth using this approach. However, in our case, segments are too short. As @j0sh points out, the seeking is not an option here.

livepeer / verification-classifier

Compare performance of reading specific sampled frames instead of reading full video #99