sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Apache License 2.0
156 stars 10 forks source link

How to get the classifier files of activitynet #5

Closed Wenju-Huang closed 5 months ago

Wenju-Huang commented 5 months ago

such as cuhk_val_simp_7.json new_3ensemble_uniformerv2_large_only_global_anet_16x10x3.json

sming256 commented 5 months ago

These are the video-level classification results. For instance, you will have a 200-classification task for ActivityNet. The above classifiers are provided by BMN, InternVideo, or InternVideo2.

To get similar video-level classification results on ActivityNet, you can refer to InternVideo2's implementation. In short; you need to regard each segment in the training videos as a video clip, and apply the action recognition task on these clips. In testing, you can uniformly sample multiple clips from the video and ensemble them as the final video result.

Note that such a video-level classifier only works for the datasets where each video has multiple actions, but these actions share the same action category, such as ActivityNet or HACS. For THUMOS, Ego4D, and Epic-Kitchens, even in the same video, different actions have different categories, and combining external classifiers may lead to worse performance.

Wenju-Huang commented 5 months ago

If I run a test video that is not in the ActivityNet, how should I modify the post_processing part

sming256 commented 5 months ago

In this case, if you are using an ActivityNet-pretrained TAD model, you need

  1. Generate the video classification result of this video by yourself. Save it in JSON with a similar format in the provided classifier.
  2. Change the JSON file in the post-processing to your generated file.

We are planning to release an end-to-end HACS-pretrained TAD model without an external classifier. I think this model will be more suitable for zero-shot testing.