sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Apache License 2.0
106 stars 5 forks source link

Does Adatad support multi-label temporal action detection? #3

Closed Caspeerrr closed 2 months ago

Caspeerrr commented 2 months ago

Congrats on the great work! I was wondering if Adatad supports multi-label temporal action detection (e.g. for multi-thumos)? Thanks!

sming256 commented 2 months ago

Yes. AdaTAD uses ActionFormer as the detection head, so it supports multi-label TAD. You can combine the feature-based Multi-THUMOS ActionFormer config and end-to-end THUMOS AdaTAD config together.

sming256 commented 1 month ago

Hi @Caspeerrr , we release AdaTAD's results on Multi-THUMOS at here. The performance is the following.

Backbone Frames Img Size mAP@0.2 mAP@0.5 mAP@0.7 ave. mAP (0.1:0.9:0.1)
VideoMAE-S 768 160 61.34 46.74 26.88 40.77
VideoMAE-B 768 160 63.90 48.74 28.72 42.76
VideoMAE-L 768 160 66.06 51.80 31.73 45.15
VideoMAE-H 768 160 67.20 52.99 32.70 46.02
VideoMAEV2-g 768 160 68.23 53.87 33.03 46.74
VideoMAEV2-g 1536 224 71.11 55.83 34.86 48.73