Open ayushkothari27 opened 4 years ago
I think each frame corresponds to a score in the tsv file. Assuming a frame rate of 30 frames per sec, the 1st video with a length of 5 min 54 sec (about 360sec) should have more than 10000 frames.
@aayushkothari11 first, you can get fps and duration time of each videos using "ffmpeg -i video.mp4" fps of 1st video is about 29.97fps(about 30fps) duration time is 05:53.64
second, importance scores are scores of each frames. you can check a length of 1st video. i checked using python. a length of importance scores of 1st video is 10,597. therefore, number of frames of 1st video = (560 + 53) 30 = 10,590
@weirme @SinDongHwan Even I thought the same with frame rate 30 fps but that only satisfying for some videos. After using ffmpeg, I came to know that some videos use 30 while some use 24 fps. Thank you for your help! Much appreciated. Also is there any good way to convert video from these fps to 3 fps so that the importance scores of the frames can also be easily related.
Hello, How should I interpret the importance scores in the tsv file of the original TVSum50 dataset? Are they for each frame? If yes, what is the frame rate used? What is the significance of a shot being of 2 seconds?
The data annotation file has importance scores for each video. The readme said that each shot is 2 seconds. Hence while going through the data for the 1st video (length 5 min 54 sec), the number of annotations provided was over 10000. I am not able to understand how the length of the video is related to the number of annotations. Multiplying each video duration with commonly used frame rates (24-30) doesn't help as well.