li-plus / DSNet

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization
https://ieeexplore.ieee.org/document/9275314
MIT License
208 stars 50 forks source link

[question] How annotations are done #10

Closed mohamedmossadaeye closed 3 years ago

mohamedmossadaeye commented 3 years ago

I imagined that json in custom_data folder would model total frames as binary list for example assume we have 10 frames in the video and the most important segments is from frame [3 >> 6] and [9 >> 10] then the annotation would be [0,0,1,1,1,1,0,0,1,1] in other words seems confused about this statement in readme.md file The user summary of a video is a UxN binary matrix, where U denotes the number of annotators and N denotes the number of frames in the original video why to replicate frames U times and what is U

li-plus commented 3 years ago

Hi @mohamedmossadaeye, in the actual annotation process, a video is labeled by many annotators (users) to avoid bias, because different people have their own preference for "important" frames. Each annotator produces an N-dim binary vector, so U annotators together will generate a UxN binary matrix. The scores for each frame will be averaged across all annotators and then we get the ground truth score for each frame.

mohamedmossadaeye commented 3 years ago

great , what tools used to annotate the videos ?

li-plus commented 3 years ago

Sorry but I don't have any recommendation for it. We didn't make a dataset. We only used the public ones. I made a quick search and find this repo https://github.com/video-annotator/pythonvideoannotator. Maybe it could help you?