mit-han-lab / temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
https://arxiv.org/abs/1811.08383
MIT License
2.05k stars 418 forks source link

Confusion about the parameter. #228

Closed ForeverPs closed 1 year ago

ForeverPs commented 1 year ago

What does the "self.num_segments" stand for in the TSM network? If I have a video clip within 3 seconds, and I sample it using 6fps, then I get the video frames in the shape of 1x(3x6)x3x224x224, where 1 is the batch_size, 18 is the number of frames, 3x224x224 is the shape of images. The parameter "self.num_segments" indicates which dimension in this case? I am just a little confused.