mx-mark / VideoTransformer-pytorch

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
272 stars 34 forks source link

Maskfeat downstream task performance #6

Closed WHlTE-N0lSE closed 2 years ago

WHlTE-N0lSE commented 2 years ago

I tried to finetune a classifier with the maskfeat pretrained weights you provided, but the final performance was terrible (UCF101 Acc@top1=52%). What is your performance with finetune maskfeat? and what are your mvit finetune settings?

mx-mark commented 2 years ago

@WHlTE-N0lSE Firstly, the MaskFeat has not been trained so long, and maybe can not generalize well. Second, the MaskFeat relies heavily on finetune recipe to tune, the detailed finetune settings can be found in the original paper which we almost follow it.