Open pcshih opened 5 years ago
This function generates a summary from summary of 20 users in the dataset.
The implementation is based on which paragraph of FSCN paper or other paper?
Chapter 3.1 of this paper: Diverse sequential subset selection for supervised video summarization. In my implementation, the greedy algorithm selects the frame marked by the most users each time.
After reading Chapter 3.1, I still cannot realize the process. Given 3 human summaries with 5 frames: A: [1,0,1,1,0] B: [0,0,1,0,0] C: [0,0,0,1,0]
How to get the final summary? First: calculate the select times of each frame -> [1,0,2,2,0] Second: I have no idea...
In my implementation, initialize oracle summary as [0, 0, 0, 0, 0], and then pick the most selected frame (here the third), now the oracle summary will be [0, 0, 1, 0, 0]. Determine if the F-score between oracle summary and user summary increases after adding this frame. If true, continue to select next frame, otherwise it ends. But it is just my implementation, I didn't find a specific description of the greedy algorithm used in the paper. So I'm not sure if the algorithm is like this.
Where is FCSN mentioned that they use "Diverse sequential subset selection for supervised video summarization" for generating a summary from summary of users?
This method is mentioned in supplementary materials of paper Video Summarization with Long Short-term Memory.
After I read the paragraph, I implement it.
Is my understanding identical to yours?
But the performance is quite bad...
Have you print the final F-score between generated oracle summary and user summary?
Did you mean the parameter "best_fscore"?
It seems slightly different.
I found that tvsum use avg but summe use max when evaluating. After I change summe to max, my result gets better.
But I do not know why to use this method...
Could you share the tvsum video on your google drive? tvsum needs authorization....
I found that tvsum use avg but summe use max when evaluating. After I change summe to max, my result gets better.
But I do not know why to use this method...
Is this result on SumMe? It seems close to that in paper!
Could you share the tvsum video on your google drive? tvsum needs authorization....
Wait a moment, I'm now uploading it...
https://github.com/KaiyangZhou/pytorch-vsumm-reinforce/blob/fdd03be93f090278424af789c120531e49aefa40/main.py#L164 I found that tvsum use avg but summe use max when evaluating. After I change summe to max, my result gets better. But I do not know why to use this method...
Is this result on SumMe? It seems close to that in paper!
Yes, it is summe.
Could you share the tvsum video on your google drive? tvsum needs authorization....
Wait a moment, I'm now uploading it...
Thank you
Got it. Thank you very much. Did you figure out ? https://github.com/KaiyangZhou/pytorch-vsumm-reinforce/blob/fdd03be93f090278424af789c120531e49aefa40/main.py#L164
May be it is a default setting in evaluation? I also think it's strange... And I noticed that selected key frames of videos in summe differ greatly from each user, F-score between generated oracle summary and user summary is only nearly 50%, but that is nearly 70% in tvsum. In this case, getting a summary close to every user seems to be difficult. Is this probably a reason to select max?
I agree with your opinion. Let's take this evaluation method for granted. I also implement this paper which architecture is based on FCSN but there are some problems...
I have not read this paper yet, its architecture looks complicated.
Do you have any idea of FCSN in unsupervised version?
No... I skip that part when reading the paper...
Shall we implement that part?
I will try to implement it after reading that part, but there may be some problems because my computer at home doesn't have a nvidia gpu :sweat_smile::sweat_smile:
I am counting on you.
https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/make_dataset.py#L70
What is the meaning of this function?