Closed Wenju-Huang closed 5 months ago
These are the video-level classification results. For instance, you will have a 200-classification task for ActivityNet. The above classifiers are provided by BMN, InternVideo, or InternVideo2.
To get similar video-level classification results on ActivityNet, you can refer to InternVideo2's implementation. In short; you need to regard each segment in the training videos as a video clip, and apply the action recognition task on these clips. In testing, you can uniformly sample multiple clips from the video and ensemble them as the final video result.
Note that such a video-level classifier only works for the datasets where each video has multiple actions, but these actions share the same action category, such as ActivityNet or HACS. For THUMOS, Ego4D, and Epic-Kitchens, even in the same video, different actions have different categories, and combining external classifiers may lead to worse performance.
If I run a test video that is not in the ActivityNet, how should I modify the post_processing part
In this case, if you are using an ActivityNet-pretrained TAD model, you need
We are planning to release an end-to-end HACS-pretrained TAD model without an external classifier. I think this model will be more suitable for zero-shot testing.
such as cuhk_val_simp_7.json new_3ensemble_uniformerv2_large_only_global_anet_16x10x3.json