Hello, thank you for the insightful work!
I' m trying to reproduce the result, but I have some questions about the hyperparameters setting.
frame interval: Is the default value in config selected with respect to the frame rate of the corresponding dataset? How about the dataset where videos have different frame rate? Should I re-sample the videos beforehand?
clip len: If I want to change the clip len, is the rule mentioned in this isue appliable to other datasets?
batch size: as mentioned in this issue. Does the batch size of 64 in the paper actually mean 8 samples per GPU times 8 GPUs?
I also have some problems when reproducing reproducing the results in table 6. Does the memory here mean the peak memory usage (during both forward and backward passes) per GPU? What is the clip len used here?
The best frame interval and number of frames might depend on the dataset. In general, you want to cover most of the video. If the video is too long, you can do uniform sampling.
Yes, but on SSv2 and Diving48 we are doing uniform sampling. So you don't have to change frame interval.
Yes, it is 8x8.
It is the memory cost during training. I might forget the details, but it should be 8 frames.
Thank you for the reply!
One more question:
From here and here during data preprocessing, it seems that the test split annotation is used as the validation set, and there is no test set. Is the result reported in the paper the accuracy on the validation set?
Hello, thank you for the insightful work! I' m trying to reproduce the result, but I have some questions about the hyperparameters setting.
I also have some problems when reproducing reproducing the results in table 6. Does the memory here mean the peak memory usage (during both forward and backward passes) per GPU? What is the clip len used here?