Closed simonJJJ closed 1 year ago
Hi @simonJJJ , thanks for your interest in our work. I may not be able to provide the training logs because I don't have access now.
The vitclip_large_k400 config is not consistent with the paper, i.e. training num_frames, training frame_interval, ColorJitter, backbone lr_mult, warmup epochs etc.
I simply run the vitclip_large_k400 config in your repo but get the top1 acc = 85.69. So I want to know the strictly correct config.
Thanks.
Hi, sorry we missed some implementation details in the paper. For ViT-L on K400, we use ColorJitter and 0.1x backbone lr to alleviate overfitting. I updated the config. You may try it again. The configs are for 8GPU batchsize=64. Another possible reason for the performance is that the K400 videos may be different.
Hi,
I directly evaluate your pretrained model ViT-L/14 32x3x1 on K400 by using the updated config that you fix.
However, I get the top1 acc = 86.23. Add the ThreeCrop for infer, the top1 acc = 86.69. The result is still far from the paper reported top1 acc=87.5. My validation set has 19877 valid videos.
Hi, thanks for the great work!
I wonder is there any training log available for clip pretrained models?