The Bug of Classifier3D.py

Hello,

We feed 3 dimensional feature vectors to the classifier (Time Length, Batch, Channels). The Linear modules consider this as 2-dimensional (LxB, C). It is like we use .view(L*B,C) implicitly. This is why we do permute to (B, C, L) and then pool1d on the temporal dimension L, and additional softmax pooling if we are in eval mode with long videos (as opposed to short training clips).

Do you have an actual error when running the model or just reading the code ? If you have an error can you check the dimensions of your inputs to the vgg model ? I commented in the code with the feature dimensions at each step of the model, from (B, C, L, H, W) input to (B, C) output scores. L is temporal, HxW are spatial dimensions.

thayral / temporal-stochastic-softmax

The Bug of Classifier3D.py #1