tianyu0207 / RTFM

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]
324 stars 77 forks source link

After obtaining the final temporal feature representation X #39

Closed DungVo1507 closed 3 years ago

DungVo1507 commented 3 years ago

Thanks for viewing my issue, @tianyu0207 I have 4 questions that I hope you can explain:

  1. After obtaining X, the snippets have been divided into 2 groups normal and abnormal, right?
  2. In the Select Top-k snippets stage, do you select k snippets from both the normal and the abnormal groups, or will each group select k snippets?
  3. Assuming k = 3, in case a video has less than 3 abnormal or normal snippets, how will RTFM choose?
  4. When the input is normal video, how will the RTFM-enabled Snippet Classifier Learning stage classify?
tianyu0207 commented 3 years ago

Thanks for viewing my issue, @tianyu0207 I have 4 questions that I hope you can explain:

  1. After obtaining X, the snippets have been divided into 2 groups normal and abnormal, right?
  2. In the Select Top-k snippets stage, do you select k snippets from both the normal and the abnormal groups, or will each group select k snippets?
  3. Assuming k = 3, in case a video has less than 3 abnormal or normal snippets, how will RTFM choose?
  4. When the input is normal video, how will the RTFM-enabled Snippet Classifier Learning stage classify?

Hi,

  1. After obtaining the video, the snippets will be divide into 32 segments. Each segment will be a 2048 feature vector. We don't change the order of the snippets.
  2. We select the snippets with top-k magnitude from each normal and abnormal video to obtain hard normals and pseudo abnormals.
  3. Assuming k = 3, in case a video has less than 3 abnormal or normal snippets, RTFM will choose the top-3 as well. This may include some of the false snippets. But in our experiment, we notice our approach is robust enough to handle this.
  4. Each batch will have the same number of normal and abnormal videos. Hence, there will be the equal number of samples from two classes during the classifier learning stage.
tianyu0207 commented 3 years ago
  1. pseudo

Hard normal will be the snippets that are similar to abnormal events. Pseudo abnormal means there may be snippets that are not actual abnormal because we try to select abnormal instances from the abnormal bag. There are no snippet-level labels. I don't quite understand your second question. Sorry..

DungVo1507 commented 3 years ago

Thank you so much @tianyu0207, The second question means: you say each batch will have the same number of normal and abnormal videos, so the number of normal and abnormal videos in the dataset should be equal right? If each batch will have the same number of normal and abnormal videos, is the drawing of how RTFM works I attached below correct? RTFM

I hope you will reply! Appreciate your support!

tianyu0207 commented 3 years ago

Thank you so much @tianyu0207, The second question means: you say each batch will have the same number of normal and abnormal videos, so the number of normal and abnormal videos in the dataset should be equal right? If each batch will have the same number of normal and abnormal videos, is the drawing of how RTFM works I attached below correct? RTFM

I hope you will reply! Appreciate your support!

each batch has the same number of normal and abnormal videos does not necessarily mean you have the equal number of videos in the dataset. you just sample evenly for each batch.

Hi I reckon Your figure is correct.