Though i do not see the code for fine tune, i guess you use the video segmentation annotation for fine tune the model pretrained on kinetics. is it right ? so did you fine tune the model with crossentropy loss, and view it as a single label classification problem rather than a multi-label problem ? it is a common setting for finetune on Charades dataset ?
i have try to use multi-label loss for fine-tune, but only got 25% MAP, so i want to check the problem for such big margain refer to your performance .
Though i do not see the code for fine tune, i guess you use the video segmentation annotation for fine tune the model pretrained on kinetics. is it right ? so did you fine tune the model with crossentropy loss, and view it as a single label classification problem rather than a multi-label problem ? it is a common setting for finetune on Charades dataset ? i have try to use multi-label loss for fine-tune, but only got 25% MAP, so i want to check the problem for such big margain refer to your performance .