(2) With larger input size, i.e.,112×112 to 224×224, longer input length, i.e.,16 frames to 64 frames, and a more powerful backbone network, i.e., R(2+1)D to S3D-G, the performance of the proposed STS can be further improved drastically on both
Does that mean the pertaining on K400 and/or downstream tasks are both using 64 frames as input to train?
Hi,
In section 6.1, it is stated as follows:
Does that mean the pertaining on K400 and/or downstream tasks are both using 64 frames as input to train?