What does the "self.num_segments" stand for in the TSM network? If I have a video clip within 3 seconds, and I sample it using 6fps, then I get the video frames in the shape of 1x(3x6)x3x224x224, where 1 is the batch_size, 18 is the number of frames, 3x224x224 is the shape of images. The parameter "self.num_segments" indicates which dimension in this case? I am just a little confused.
What does the "self.num_segments" stand for in the TSM network? If I have a video clip within 3 seconds, and I sample it using 6fps, then I get the video frames in the shape of 1x(3x6)x3x224x224, where 1 is the batch_size, 18 is the number of frames, 3x224x224 is the shape of images. The parameter "self.num_segments" indicates which dimension in this case? I am just a little confused.