taoyang1122 / adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
Apache License 2.0
278 stars 21 forks source link

About hyperparameters setting for reproducing the results #32

Closed hsi-che-lin closed 1 year ago

hsi-che-lin commented 1 year ago

Hello, thank you for the insightful work! I' m trying to reproduce the result, but I have some questions about the hyperparameters setting.

I also have some problems when reproducing reproducing the results in table 6. Does the memory here mean the peak memory usage (during both forward and backward passes) per GPU? What is the clip len used here?

taoyang1122 commented 1 year ago

Hi, thanks for your interest in our work.

  1. The best frame interval and number of frames might depend on the dataset. In general, you want to cover most of the video. If the video is too long, you can do uniform sampling.
  2. Yes, but on SSv2 and Diving48 we are doing uniform sampling. So you don't have to change frame interval.
  3. Yes, it is 8x8.
  4. It is the memory cost during training. I might forget the details, but it should be 8 frames.
hsi-che-lin commented 1 year ago

Thank you for the reply! One more question: From here and here during data preprocessing, it seems that the test split annotation is used as the validation set, and there is no test set. Is the result reported in the paper the accuracy on the validation set?

taoyang1122 commented 1 year ago

Yes, it is the same as previous works.