wei-tim / YOWO

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
840 stars 161 forks source link

Background samples not use for training and evaluating? #52

Closed Tonyfy closed 4 years ago

Tonyfy commented 4 years ago

Hi,thanks for open source your excellent work. It seems that background samples not use for training and evaluating? trainlist.txt and testlist.txt only involve imgs hit by some groundtruth action.

by the way,I test the model "yowo_ucf101-24_16f_best_fmap_08749.pth",but only get 60.02% frame-map and 44.18% video-map when tiou=0.5, which should be 87.2 and 48.8% in paper.

okankop commented 4 years ago

At the training time, we are not using the background images. At the testing time, for example for video-mAP results, we use all the images in the videos.

I have recently checked and verified that the pretrained model shared in the repo achieves 80.4 frame-maP for UCF101-24 dataset. Previously there was a bug at the evaluation of the UCF101-24 dataset, but we fixed it now. For the most updated results, please check out the latest version of the paper.