laura-wang / video_repres_sts

Pytorch implementation of our T-PAMI 2021 paper: Self-supervised Video Representation Learning by Uncovering Motion and Appearance Statistics
48 stars 7 forks source link

Clarification of input length in action recognition #5

Closed farleylai closed 2 years ago

farleylai commented 2 years ago

Hi,

In section 6.1, it is stated as follows:

(2) With larger input size, i.e.,112×112 to 224×224, longer input length, i.e.,16 frames to 64 frames, and a more powerful backbone network, i.e., R(2+1)D to S3D-G, the performance of the proposed STS can be further improved drastically on both

Does that mean the pertaining on K400 and/or downstream tasks are both using 64 frames as input to train?

laura-wang commented 2 years ago

Hi,

Yes, it is.