happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
419 stars 77 forks source link

Can this method be used for multi-label temporal localization? #43

Closed ttgeng233 closed 2 years ago

ttgeng233 commented 2 years ago

Thanks for your great work! The datasets that used here are all single-label datasets where there is just one action per time-step? Can this model be used on multi-label datasets (i. e., MultiTHUMOS, Charades) by replacing loss functions or just replacing nms with multi-class nms.

tzzcl commented 2 years ago

Hi, since our method are not designed for multi-label datasets, you may need a little bit tweak to our current models.

Current cls head uses Focal loss, which already supports multi-label groundtruths. However, current regression head is class-agnostic, , you may need to design a class-aware regression head and modify the code.

ttgeng233 commented 2 years ago

Thank you very much, is there noting to do with NMS? if some modifications are needed during inference?

tzzcl commented 2 years ago

Hi, I think our current code already supports multi-class NMS. However, our code can not handle the case that multiple actions with same labels occurs at same time.

tzzcl commented 2 years ago

Closed due to inactivity.