mit-han-lab / temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
https://arxiv.org/abs/1811.08383
MIT License
2.07k stars 417 forks source link

How to set up the correct combination of shift_div and frame_count? #223

Open qwangku opened 2 years ago

qwangku commented 2 years ago

Thanks for sharing this great resources. I am trying to play with different frame rates for TSM. I noticed there are 3 important attributes here: frame_count, num_segments and shift_div.

For example, if I reduced frame_count from 8 to 4 (which means the video is split into 4 segments this time, so the equivalent frame rate is reduced), should I also adjust "shift_div" and "num_segments"? Am I right to say "shift_div" should always be equal or smaller than "frame_count"?

yjang43 commented 2 years ago

I don't know if you are still looking for an answer. I was also looking at the code to get an answer for this question, and I end up to this part of code which I believe answers the question.

    @staticmethod
    def shift(x, n_segment, fold_div=3, inplace=False):
        nt, c, h, w = x.size()
        n_batch = nt // n_segment
        x = x.view(n_batch, n_segment, c, h, w)

        fold = c // fold_div
        if inplace:
            # Due to some out of order error when performing parallel computing. 
            # May need to write a CUDA kernel.
            raise NotImplementedError  
            # out = InplaceShift.apply(x, fold)
        else:
            out = torch.zeros_like(x)
            out[:, :-1, :fold] = x[:, 1:, :fold]  # shift left
            out[:, 1:, fold: 2 * fold] = x[:, :-1, fold: 2 * fold]  # shift right
            out[:, :, 2 * fold:] = x[:, :, 2 * fold:]  # not shift

        return out.view(nt, c, h, w)

fold_div is equal to shift_div. If it is set to 3, then 2 / 3 of the channels will be shifted. If set to 8, then 2 / 8. I am studying this code as well, so please take this with a grain of salt 😄