naraysa / 3c-net

Weakly-supervised Action Localization
49 stars 9 forks source link

Do you utilize additional data from THUMOS14 val set for temporal action localization? #4

Closed Yuxin-CV closed 4 years ago

Yuxin-CV commented 4 years ago

Nice work and congrats to your ICCV paper. Thanks for sharing your code.

As you mentioned in the paper, for THUMOS14, you follow the setting in STPN(CVPR 18): use 200 videos(20 categories) in the val set for training.

I did not carefully go through every line of your code. It seems that you use all 1010 videos in the val set to train your classifier, this is fair for the action classification task. But it seems that you use the same network to perform the temporal action localization task. I don't think this is the standard protocol for weakly-supervised temporal action localization.

naraysa commented 4 years ago

Thanks for your interest.

We use all the 1010 validation videos to train our network over all the 101 classes. Since, the temporal annotations are available only for 20 classes, we evaluate on the subset of test set containing around 210 videos. This has been followed in W-TALC paper also.

I am not sure what you mean by the standard protocol in weakly-supervised action localization. But recent works in literature (STPN , W-TALC) are using the same network (trained using video-level labels) to obtain localization results.

Yuxin-CV commented 4 years ago

Thanks for your reply!

In my point of view, STPN and W-TALC only use 200 videos(20 categories) in the val set to train a network to perform the weakly-supervised action localization task. It is OK to use all the 1010 videos in the val set to train a net for action classification task. But it is no good to use a net trained on all the 1010 videos to perform localization task.

For localization task, I think it is better to evaluate your net trained only on 200 videos, not 1010 videos.

naraysa commented 4 years ago

W-TALC also uses another setting trained on all 101 classes. See Table 1 of W-TALC paper.

You can't say definitely that it is no good to use all 101 classes for training. It might be useful to understand what these 20 categories aren't also. Generally, it depends on the application and the loss formulation used. If you feel training with only 20 classes is the only correct way, you can try it out. Thanks