taoyang1122 / adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
Apache License 2.0
278 stars 21 forks source link

About the performance gap with the released checkpoints #35

Open yangbang18 opened 1 year ago

yangbang18 commented 1 year ago

Thanks for your great work. I have two questions:

1) With the same kinetics400 validation set (19796 videos) as that of mmaction, the same setting as your configs/recognition/vit/vitclip_base_k400.py (32 x 3 x 1 Views during testing), the checkpoint vit_b_clip_32frame_k400.pth you provided, my evaluation results on kinetics400 validation set is 83.34 (acc@1) and 96.45 (acc@5), which is lower than your results given in README.md, i.e., 84.7 (acc@1) and 96.7 (acc@5). Is there any possible reason for the gap (e.g., do you have a smaller kinetics400 validation set due to expired links)?

2) The checkpoint vit_b_clip_32frame_diving48.pth you provided is tested on 32 x 1 x 1 Views, according to README.md. But the Views in configs/recognition/vit/vitclip_base_diving48.py is 32 x 1 x 3. My evaluation results is 88.43 (acc@1, 32 x1 x 3) and 88.32 (acc@1, 32 x 1 x 1), which is lower than your results given in README.md, i.e., 88.9 (acc@1, 32 x 1 x 1). Is there any possible reason for the gap?

I am also confused about the following mismatch: 1) The checkpoint vit_b_clip_32frame_k700.pth you provided is tested on 32 x 3 x 3 Views, according to README.md. But the Views in configs/recognition/vit/vitclip_base_k700.py is 8 x 3 x 3.

yangbang18 commented 1 year ago
  1. With the same kinetics700 validation set (34824 videos) as that of mmaction, the checkpoint vit_b_clip_32frame_k700.pth you provided and 32 x 3 x 3 testing Views, my evaluation result on kinetics700 validation set is 75.78 (acc@1), which is lower than your result given in README.md, i.e., 76.9 (acc@1). Is there any possible reason for the gap?
taoyang1122 commented 1 year ago

Hi @yangbang18 , thanks for your interest in our work.

  1. We have 19404 validation videos. We are using the Kinetics-400 dataset from here.
  2. It may be caused by the difference of environment and device.
  3. The config is an example. You could modify the frames and frame_interval for different settings.
  4. I don't have the access to the K700 dataset now. We were downloading the K700 dataset following this
yangbang18 commented 1 year ago

Sorry, I can't visit your kinetics400 link (even with VPN).

BTW, I have some new findings recently.

With the same kinetics400 validation set (19796 videos) as that of mmaction, I re-produce the training process at 8 V100s with configs/recognition/vit/vitclip_base_k400.py, which produces 83.36 (acc@1) and 96.41 (acc@5) under 32x3x1 views. These results are similar to the checkpoint vit_b_clip_32frame_k400.pth you provided.

With your acc@1 (84.9% according to the paper) reported on 19404 videos, the performance range of the model on my validation set (19796 videos) would be [(19404 * 84.9% + 392 * 0%) / 19796 = 83.2%, (19404 * 84.9% + 392 * 100%) / 19796 = 85.2%]

Given that my re-produced 83.36 is close to the lower bound (83.2), I suspect the missing 392 (19796 - 19404) videos in your validation set are hard for the model to classify.

About the claim: It may be caused by the difference of environment and device, I also had a try. I evaluated the released vit_b_clip_32frame_k400.pth checkpoint at V100 and 4090. Both devices gave the same results.

taoyang1122 commented 1 year ago

Hi, the link is from academic torrent. The link is provided in MMAction2 . You may try other VPN. I will check the results on Diving48.

yangbang18 commented 1 year ago

I downloaded kinetics 400 at https://opendatalab.com/OpenMMLab/Kinetics-400, the same data as MMAction2 (i.e., the same number of training/validation videos). So did the kinetics 700.

I can reproduce Diving48 results by training. So you can overlook this part.

hsi-che-lin commented 1 year ago

I downloaded kinetics 400 at https://opendatalab.com/OpenMMLab/Kinetics-400, the same data as MMAction2 (i.e., the same number of training/validation videos). So did the kinetics 700.

I can reproduce Diving48 results by training. So you can overlook this part.

Hello, @yangbang18 I've been trying to reproduce Diving48 results by training recently. But I can't obtain the reported results. Could you kindly provide your settings, configuration, or log? Thank you.