Open qwangku opened 2 years ago
I don't know if you are still looking for an answer. I was also looking at the code to get an answer for this question, and I end up to this part of code which I believe answers the question.
@staticmethod
def shift(x, n_segment, fold_div=3, inplace=False):
nt, c, h, w = x.size()
n_batch = nt // n_segment
x = x.view(n_batch, n_segment, c, h, w)
fold = c // fold_div
if inplace:
# Due to some out of order error when performing parallel computing.
# May need to write a CUDA kernel.
raise NotImplementedError
# out = InplaceShift.apply(x, fold)
else:
out = torch.zeros_like(x)
out[:, :-1, :fold] = x[:, 1:, :fold] # shift left
out[:, 1:, fold: 2 * fold] = x[:, :-1, fold: 2 * fold] # shift right
out[:, :, 2 * fold:] = x[:, :, 2 * fold:] # not shift
return out.view(nt, c, h, w)
fold_div
is equal to shift_div
. If it is set to 3, then 2 / 3 of the channels will be shifted. If set to 8, then 2 / 8. I am studying this code as well, so please take this with a grain of salt 😄
Thanks for sharing this great resources. I am trying to play with different frame rates for TSM. I noticed there are 3 important attributes here: frame_count, num_segments and shift_div.
For example, if I reduced frame_count from 8 to 4 (which means the video is split into 4 segments this time, so the equivalent frame rate is reduced), should I also adjust "shift_div" and "num_segments"? Am I right to say "shift_div" should always be equal or smaller than "frame_count"?